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(54) METHOD AND DEVICE FOR INDEXING PLURAL GRANULARITIES AND SUPPORTING 
EXPANSION OF QUERY WHILE EFFECTIVELY USING QUERY PROCESSING 

(57)Abstract: 

PROBLEM TO BE SOLVED: To expand a query not 
physically but conceptually and to reduce documents 
related as a result without missing them by performing 
efficient query processing while using an index in small 
size, and processing continuous queries. 
SOLUTION: The size of index is reduced by merging 
several entries (tuples) to one entry with the granularity 
of much higher level. During query processing, that tuple 
is used for retrieving related documents. Afterwards, the 
source word of the query is used for ranking documents 
to be provided as a result during query processing based 
on severe matching, semantically similar matching and 
constructively related matching with much higher 
granularity. Thus, while maintaining entire accuracy in a 

retrieval mechanism, the size of index can be reduced and more accelerated query 
processing can be provided. 
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1. This document has been translated by computer. So the translation may not reflect the 
original precisely. 

2. **** shows the word which can not be translated. 
3. In the drawings, any words are not translated. 



[Claim(s)] 

[Claim 1]A way a word in said index searches a database of a document which is the original 
degree of fragmentation including a relation between a word included in a preliminary index of 
a document characterized by comprising the following, and a document, and said index and 
said word. 

A step [ / word / in said preliminary index ] transposed more to a concept of a higher rank in 
order that said method may generate an index of the coarser degree of fragmentation of small 
size a. 

b) A step which extends logically said query applied to a database of a document by 

[ corresponding ] transposing to a concept of a higher rank more in a word of a query which 

has the original degree of fragmentation. 

A step which performs said query extended logically and corresponds using said index of the 
degree of fragmentation coarser than c) and which searches a document relevant to a concept 
of a higher rank more. 

[Claim 2]A search method by which a step which ranks a searched document being further 
included in claim 1 based on an order of d relevance. 

[Claim 3]A search method, wherein a document searched with said rank step is ranked in 
claim 2 using a word of a query which has the original degree of fragmentation. 
[Claim 4]A search method characterized by an order of relevance being the order when not 
matching when a word of a query and a word included in a searched document make the start 
a case where it matches strictly, and matches semantically henceforth and it matches 
syntactically in claim 3. 

[Claim 5]A search method characterized more by a concept of a higher rank being a semantic 
concept of a higher rank more at said replacement step in claim 1. 

[Claim 6]A search method characterized more by each of a semantic concept of a higher rank 
containing a synonym in claim 5. 
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[Claim 7]A search method, wherein a part of word in a preliminary index which meets the 
predetermined standard by said replacement step is transposed to a word to which a generic 
concept corresponds more in claim 1. 

[Claim 8]A search method by which being based on whether said predetermined standard has 
said word in a term dictionary in claim 7. 

[Claim 9]A search method characterized more by said concept of a higher rank being a 

syntactic concept of a higher rank more at said replacement step in claim 1 . 

[Claim 10]A search method, wherein each of said syntactic concept of a higher rank includes 

more a word generated within [ both ] a document in claim 9 exceeding frequency of a certain 

level. 

[Claim 1 1]A search method having a step replaced with a concept to which a higher rank 
corresponds more which has a semantic concept of a higher rank only for a word of a query by 
which said step which extends a query logically meets the further b(i) predetermined standard 
in claim 1 more. 
[Claim 12]Claim 11 comprising: 

A step which extends said query still more logically when said step which extends a query 
logically adds a word which b-(ii)-corresponds, and which is syntactically related more to each 
of said concept of a higher rank further. 

b). A step which extends said query still more logically by adding a word syntactically related to 
each of a word in a query which is not meeting the (iii) aforementioned predetermined standard 

[Claim 13]Claim 12 comprising: 

A step by which said step which extends a query logically meets the further a(iv) 
predetermined standard, which is related in said syntactically related word and which is 
transposed more to a concept of a higher rank. 

a). Said word relevant to a (v) syntax target, and a step removed from a query after extending 
a portion which becomes redundant among said concepts of a higher rank more 

[Claim 14]A search method by which being based on whether said predetermined standard 
has said word in a term dictionary in claim, 13. 

[Claim 15] A search method, wherein a word which has two or more meanings in said 
preliminary index at said replacement step is replaced more by two or more corresponding 
concepts of a higher rank in claim 1 . 

[Claim 16]A search method, wherein a word by which said predetermined standard is not met 
in claim 12 is a proper noun. 

[Claim 17]A search method, wherein said execution step is continued in a continuous stage 
until a corresponding document more relevant to a concept of a higher rank is searched only 
for a predetermined number in claim 1. 

[Claim 18]A search method with which said each stage is characterized by expressing one 
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extended class in claim 17. 

[Claim 19]A search method with which said each stage is characterized by expressing one slot 
in one extended class in claim 17. 

[Claim 20]A search method characterized by searching a document with an order reflecting a 
level of importance assigned to one word in a query at least in each stage in claim 17. 
[Claim 21]How to search a database of a document which includes a relation between a 
concept of a higher rank, and said index and said concept more corresponding to a word of the 
degree of fragmentation of origin in which an index of a document characterized by comprising 
the following with small size and a document are contained 

A step to which said method extends logically a query applied to a database of a document by 
[ corresponding ] transposing to a concept of a higher rank more in a word of a query of the 
degree of fragmentation of a origin. 

b) A step which performs said query extended logically and corresponds using said index and 
which searches a document relevant to a concept of a higher rank more. 

[Claim 22]A search method having a step replaced with a concept to which a higher rank 
corresponds more which has a semantic concept of a higher rank only for a word of a query by 
which said step which extends a query logically meets the further a(i) predetermined standard 
in claim 21 more. 

[Claim 23]A search method characterized more by each of a semantic concept of a higher rank 
containing a synonym in claim 22. 

[Claim 24]A search method characterized more by said concept of a higher rank being a 
syntactic concept of a higher rank more in claim 21. 

[Claim 25]A search method, wherein each of said syntactic concept of a higher rank includes 
more a word generated within [ both ] a document in claim 24 exceeding frequency of a certain 
level. 

[Claim 26]Claim 22 comprising: 

A step which extends said query still more logically when said step which extends a query 
logically adds a word which a-(ii)-corresponds, and which is syntactically related more to each 
of said concept of a higher rank further. 

a). A step which extends said query still more logically by adding a word syntactically related to 
each of a word in a query which is not meeting the (iii) aforementioned predetermined standard 

[Claim 27]Claim 26 comprising: 

A step by which said step which extends a query logically meets the further a(iv) 
predetermined standard, which is related in said syntactically related word and which is 
transposed more to a concept of a higher rank. 

a). Said word relevant to a (v) syntax target, and a step removed from a query after extending 
a portion which becomes redundant among said concepts of a higher rank more 
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[Claim 28]A search method by which being based on whether said predetermined standard 
has said word in a term dictionary in claim 27. 

[Claim 29]A search method, wherein a word by which said predetermined standard is not met 
in claim 26 is a proper noun. 

[Claim 30]A search method, wherein each of said syntactic concept includes a word generated 
within [ both ] a document in claim 26 exceeding frequency of a certain level. 
[Claim 31]A search method by which a step which ranks a searched document being further 
included in claim 21 based on an order of c relevance. 

[Claim 32]A search method, wherein said searched document is ranked in claim 31 using a 
word of a query which has the original degree of fragmentation. 

[Claim 33]A search method characterized by an order of relevance being the order when not 
matching when a word of a query and a word included in a searched document make the start 
a case where it matches strictly, and matches semantically henceforth and it matches 
syntactically in claim 32. 

[Claim 34]A search method with which a word of the degree of fragmentation of origin 
contained in a document in claim 21 is characterized by two or more things corresponded 
more to a concept of a higher rank. 

[Claim 35]A search method, wherein said execution step is continued in a continuous stage 
until a corresponding document more relevant to a concept of a higher rank is searched only 
for a predetermined number in claim 21. 

[Claim 36]A search method, wherein said each stage expresses one extended class in claim 
35. 

[Claim 37]A search method, wherein said each stage expresses one slot in one extended class 
in claim 35. 

[Claim 38]A search method characterized by searching a document with an order reflecting a 
level of importance assigned to one word in a query at least in each stage in claim 35. 
[Claim 39]A system with which a database of a document whose word in said index is the 
original degree of fragmentation is searched including a relation between a word included in a 
preliminary index of a document characterized by comprising the following, and a document, 
and said index and said word. 

An indexer [ / word / in said preliminary index ] transposed more to a concept of a higher rank 
in order that said system may generate an index of size with the small degree of fragmentation 
coarser than a. 

b) A user interface for providing a query applied to a database of said document. 

c) A word of a query which has the original degree of fragmentation by [ corresponding ] 
transposing to a concept of a higher rank more, A processor which performs said query which 
extended said query logically and was extended logically using an index of the coarser degree 
of fragmentation, and corresponds and which searches a document relevant to a concept of a 
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[Claim 40]A search system, wherein said processor ranks a searched document in order of 
relevance in claim 39. 

[Claim 41]A search system said processor's using a word of a query which has the original 
degree of fragmentation, and ranking a searched document in claim 40. 
[Claim 42]A search system characterized by an order of relevance being the order when not 
matching when a word of a query and a word included in a searched document make the start 
a case where it matches strictly, and matches semantically henceforth and it matches 
syntactically in claim 41 . 

[Claim 43]A search system characterized more by said concept of a higher rank being a 
semantic concept of a higher rank more in claim 39. 

[Claim 44]A search system characterized more by each of said semantic concept of a higher 
rank containing a synonym in claim 43. 

[Claim 45]A search system characterized by a thing to which said indexer corresponds only a 
word in a preliminary index which meets the predetermined standard in claim 39, and which is 
replaced more with a concept of a higher rank. 

[Claim 46]A search system by which being based on whether said predetermined standard has 
said word in a term dictionary in claim 45. 

[Claim 47]A search system characterized more by said concept of a higher rank being a 
syntactic concept of a higher rank more in claim 39. 

[Claim 48]A search system, wherein each of said syntactic concept of a higher rank includes 
more a word generated in [ both ] a document exceeding frequency of a certain level in claim 
47. 

[Claim 49]A search system extending a query logically in claim 39 by [ which is a semantic 
concept of a higher rank more only about a word of a query by which said processor meets the 
further c(i) predetermined standard / corresponding ] transposing to a concept of a higher rank 
more. 

[Claim 50]As opposed to each of said concept of a twist higher rank to which said processor c- 
(ii)-corresponds further in claim 49, A search system extending said query logically by adding a 
syntactically related word and adding a syntactically related word to each of a word in a query 
which is not meeting the c(iii) aforementioned predetermined standard. 
[Claim 51]. In claim 50, said processor meets the further c(iv) predetermined standard. By 
removing said syntactically related word from a query after extending said related word 
relevant to [ transpose to a concept of a higher rank more and ] c(v) syntax target and a portion 
which becomes redundant among said concepts of a higher rank more, A search system 
extending said query logically. 

[Claim 52]A search system by which being based on whether said predetermined standard has 
said word in a term dictionary in claim 51 . 
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[Claim 53]A search system, wherein a word in said preliminary index which has two or more 
meanings is transposed more to two or more corresponding concepts of a higher rank in claim 
39. 

[Claim 54]A search system, wherein a word by which said predetermined standard is not met 
in claim 50 is a proper noun. 

[Claim 55]A search system, wherein execution of said query is continued in a continuous stage 
until a corresponding document more relevant to a concept of a higher rank is searched only 
for a predetermined number in claim 39. 

[Claim 56]A search system, wherein said each stage expresses one extended class in claim 
55. 

[Claim 57]A search system, wherein said each stage expresses one slot in one extended class 
in claim 55. 

[Claim 58]A search system characterized by searching a document with an order reflecting a 
level of importance assigned to one word in a query at least in each stage in claim 55. 
[Claim 59]A system with which a database of a document which includes a relation between a 
concept of a higher rank, and said index and said concept more corresponding to a word of the 
degree of fragmentation of origin in which an index of a document characterized by comprising 
the following with small size and a document are contained is searched. 
A user interface for said system to provide a query applied to a database of the a 
aforementioned document. 

b) A word of a query which has the original degree of fragmentation by [ corresponding ] 
transposing to a concept of a higher rank more, A processor which performs said query which 
extended said query logically and was extended logically using said index, and corresponds 
and which searches a document relevant to a concept of a higher rank more. 

[Claim 60]A search system extending said query logically by transposing only a word of a 
query by which said processor meets the further b(i) predetermined standard in claim 59 to a 
concept which has a semantic concept of a higher rank more, and to which a higher rank 
corresponds more. 

[Claim 61 ]A search system characterized more by each of a semantic concept of a higher rank 
containing a synonym in claim 60. 

[Claim 62]A search system characterized more by said concept of a higher rank being a 
syntactic concept of a higher rank more in claim 59. 

[Claim 63]A search system, wherein each of said syntactic concept of a higher rank includes 
more a word generated within [ both ] a document in claim 62 exceeding frequency of a certain 
level. 

[Claim 64]As opposed to each of said concept of a twist higher rank to which said processor b- 
(ii)-corresponds further in claim 60, A search system extending said query logically by adding a 
syntactically related word and adding a syntactically related word to each of a word in a query 
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which is not meeting the b(iii) aforementioned predetermined standard. 
[Claim 65]. In claim 64, said processor meets the further b(iv) predetermined standard. By 
removing said syntactically related word from a query after extending said related word 
relevant to [ transpose to a concept of a higher rank more and ] b(v) syntax target and a 
portion which becomes redundant among said concepts of a higher rank more, A search 
system extending said query logically. 

[Claim 66]A search system, wherein said predetermined standard is based on whether said 
word is in a term dictionary in claim 65. 

[Claim 67]A search system, wherein a word by which said predetermined standard is not met 
in claim 64 is a proper noun. 

[Claim 68]A search system, wherein each of said syntactic concept includes a word generated 
within [ both ] a document in claim 64 exceeding frequency of a certain level. 
[Claim 69]A search system, wherein said processor ranks a searched document further in 
claim 59 based on an order of relevance. 

[Claim 70]A search system, wherein said searched document is ranked in claim 69 using a 
word of a query which has the original degree of fragmentation. 

[Claim 71 ]A search system characterized by an order of relevance being the order when not 
matching when a word of a query and a word included in a searched document make the start 
a case where it matches strictly, and matches semantically henceforth and it matches 
syntactically in claim 70. 

[Claim 72]A search system to which a word of the degree of fragmentation of origin contained 
in a document in claim 59 is characterized by two or more things corresponded more to a 
concept of a higher rank. 

[Claim 73]A search system, wherein execution of said query is continued in a continuous stage 
until a corresponding document more relevant to a concept of a higher rank is searched only 
for a predetermined number in claim 59. 

[Claim 74]A search system, wherein said each stage expresses one extended class in claim 
73. 

[Claim 75]A search system, wherein said each stage expresses one slot in one extended class 
in claim 73. 

[Claim 76]A search system characterized by searching a document with an order reflecting a 
level of importance assigned to one word in a query at least in each stage in claim 73. 



[Translation done.] 
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1. This document has been translated by computer. So the translation may not reflect the 
original precisely. 

2. **** shows the word which can not be translated. 
3.ln the drawings, any words are not translated. 



[Detailed Description of the Invention] 
[0001] 

[Field of the lnvention]This invention relates to the field of the index generally applied to 
collecting the documents in a database, and a query. It is related with reduction of the size of 
the index used for carrying out effective extension of a query, and processing and extension of 
a query in more detail, and processing of a continuous query. 
[0002] 

[Description of the Prior Art]The conventional search system which searches a document is 
based on the common principle and methodology which classify a document by applying a 
query. A document is specified a priori by an expert or the librarian, and index attachment is 
usually manually done using the adjusted term. Index attachment of the document may be 
carried out again based on the word (word) included in the document. A user connects 
between them with the word chosen from the term which can be specified with a suitable 
Boolean operator, and searches a document. A strict matching strategy is used in a such type 
system. Although this approach has many advantages of being simple and highly precise, the 
problem of a word mismatch produces it. 

[0003]The author is the document, and the problem of the word mismatch in information 
retrieval is produced by using another word, when a certain word is being used and a user 
specifies the same concept as it as it in a query, although a certain concept is expressed. 
Drawing 1 shows that the words used in the document of HyperText Markup Language 
(HTML) related with "car (passenger car)" and "dealer (store)" may differ among various 
documents. Languages other than HTML like an extensible markup language (XML) and 
Standard Generalized Markup Language (SGML) are also used. When a user uses the word of 
"automobile (car)" and "dealer (store)" for a query, a result with which one cannot search the 
target document on the problem of a word mismatch is brought. 

[0004]ln this specification, since the object of search assumes that English is mainly contained, 
each element of the query used for search is described in English. However, these can also be 
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expressed in the language of which country according to a user's demand. Here, the meaning 
in Japanese of the element will be expressed in a parenthesis (accepting necessity) following 
the element described in said English. Therefore, Japanese in the parenthesis concerned is for 
only explaining the meaning of the element of a query, and does not affect the result of a 
query. 

[0005]Extension of the query is suggested as a technique which solves such a problem. The 
word (for example, word for which it has a related meaning of a synonym or others) to which 
the meaning was [ this approach ] similar, and a syntactically related word. A query is 
extended by using (for example, the word group which appears simultaneously in the same 
document above fixed frequency is a syntactic coincidence word) as a word in a query. In this 
way, extension of a query will increase a possibility of matching the word in a related 
document. Use of extension of a query will extend a query including the word of "car dealer 
(store of a passenger car)" so that the term of the meaning same as follows may be included. 
[0006]Line 1 . [("car(passenger car)" OFTautomobile (car)" OFTauto (car)" OR "sedan (sedan)") 
OR line 2.] ("Ford(Ford car)" OR "Buick (BYUIKKU vehicle)") AND line 3. ("dealer(store)" 
OR"Showroom (showroom)" OR "SalesOffice (sales store)"). 

[0007]There are two types of extension of the query contained in the above-mentioned 
example. Extension of the query of the line 1 and the line 3 adds the additional word relevant 
to "car" and "dealer" in a definition. That is, a semantically similar word is added, "automobile", 
"auto", and "sedan" are words which has a meaning similar to the word of "car." Similarly, 
"Showroom" and "SalesOffice" are words which has a meaning similar to the word of "dealer." 
Extension of the query of other types is shown in the line 2. 
This is based on syntactic cooccurrence relation. 

Many words used with World Wide Web (it is also only called a web) are proper nouns 
actually. 

It is not found in a term dictionary. 

For example, a proper noun is called Ford, Buick, NBA, and NFL (National Football League). 
As mentioned above, syntactic cooccurrence relation is drawn by analyzing the frequency 
where two words appears simultaneously in the same document. This is based on assumption 
that a possibility that those words are related is high, when two words appears in the same 
document frequently, as the word generated with "Ford" - "dealer (store)" "body shop (a body 
factory", "Mustang (Mustang: name of the car by Ford Co.)", "Escort (escort: name of the car 
by Ford Co.)", etc. can be considered.) 

[0008]ln order to support extension of a query, the index of the word associated by the 
definition and a syntactic relation like coincidence information must be maintained 
appropriately. The index related with a word by the definition is constituted as a hierarchy 
cluster of a, layered structure, a semantic network, or a related word. . About said layered 
structure, were carried out in Athens, Greece, in August, 1997. the 23rd International. 
Conference on Very Large. The page 538-547 of the proceedings of Data Bases, and W. 
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Please refer to "Facilitating Multimedia Database Exploration through Visual Interfaces and 
Perpetual Query Reformulations" besides Li. About said semantic network, 1990 and 
International Journal of Lexicography 3 (4), G. A. Miller in the page 245-264 "Nouns in 
WordNet : Refer to A Lexical Inheritance System." About the hierarchy cluster of a related 
word. Refer to "The SMART and SIRE Experimental Retrieval Systems" by G. Salton of New 
York, McGraw-Hill, and the page 118-155, etc. in 1983. Since a syntactic relation like syntactic 
cooccurrence relation is expressed with binary relations, the size of a syntactic-related index is 
dramatically large. Some techniques are proposed in order to solve this problem. The 
proceedings of the Fifteenth annual International ACM SIGIR Conference [ in / techniques / 
such / 1992 and Denmark ], G. "Use of syntactic context to produce term association lists for 
text retrieval" by Grefenstette, The proceedings of the 19 th Annual International ACM 
SIGIRConference in Zurich in 1996 and Switzerland, J. "Query Expantion Using Local and 
Global Document Analysis" by Xu etc., . It can set to Philadelphia, American Pennsylvania, in 
1997. Refer to "Guessing Morphology from Terms and Corpora" by C. Jacquemin of the 
proceedings of the20 th Annual International ACM SIGIR Conference. Such a technique 
includes analysis of occurrence frequency, and use of a morphological rule (for example, all 
the words are changed into the gestalt used as the origin), or a term dictionary. 
[0009]About the problem of a word mismatch, remarkable research has been done in the field 
of information retrieval (IR). About this, 1983 and McGraw-Hill BookCompany issue, G. 
"Introduction to Modern Information Retrieval" by Salton etc., 1989, Addison-Wesley 
Publishing Company, and Inc issue, G. It is based on Salton. "Automatic Text Processing : The 
Transformation, Analysis, and Retrieval of Information by Computer", and 1997, Refer to 
"Readings in Information Retrieval" by K. Sparck Jones of San Francisco, American California, 
and Morgan Kaufmann, etc. 

[0010] However, these the researches of most point to points about the standard of search, 
such as precision and recall. How to support extension of a query effectively (in 1993, the 
proceedings of the 3 rd Text Retrieval Conference in State Gaithersburg of Maryland) C. 
Although there are some researches which suggested the mechanism of refer to "Automatic 
Query Expansion Using SMART" or index attachment by Buckley etc., two problems without 
the solution to satisfy still remain. The 1st problem is a proper noun with many separate words 
in a set (for example, web) of a certain document. 

Since many each words have the same word and the syntactically related word semantically, 
the size of an index is becoming very large. 

Since a query is extended by the additional word, the 2nd problem is that the cleanup cost of a 
query becomes high. 

[001 1]Since the number of documents increases dramatically, and the word currently used is 
very various, and it is inconsistent and is occasionally wrong when dealing with the document 
information collected from the web (for example, type error), these problems become 
increasingly remarkable. In a certain research, almost all the user query about a web usually 

http://www4.ipdl.inpit.go jp/cgi-bin/tr^ 3/9/2010 



JP,2000-137738,A [DETAILED DESCRIPTION] 



Page 4 of 19 



has two words. About this, they are proceedings of 1995 and Digital Libraries (DL95), B. Croft 
etc. "Providing Government Information on the Internet : Please refer to Experienceswith 
THOMAS." However, if query extension is used, the length of a query will become long 
substantially. As a result, most existing search engines on a web can provide query expanded 
function. 

[0012]Here, the existing research in the field of query extension is outlined. Query extension 
attracted remarkable attention in the field of IR. However, the portion which has attracted 
attention until now was evaluating the grade of the standard (namely, precision and recall) of 
the search improved by extension of a query. Another research has focused on building a 
dictionary, in order to identify 1 set of similar terms about the word of the given query. 
However, old research is not tackling the point of making small the problem of efficient 
processing of a query when a query is extended, and size of the index used for performing 
extension and processing of a query. The problem which ranks a document based on strict 
matching and resemblance matching is left behind as a difficult thing. 
[0013]SMART is one of the advanced information retrieval systems known well. About this, 
1971, American New Jersey Englewood. The SMART Retrieval System -Experiments in 
Automatic Document Processing of the Gerard Salton edit published from Prentice-Hall of 
Cliffs, "Experiments with a fast algorithm for automatic classification" by R. T. Dattola of 
Chapter 12, And refer to "The SMART and SIRE Experimental Retrieval Systems" by G. 
Salton of the above-mentioned literature, etc. Each document is expressed in SMART by the 
terminological vector. Each position of the vector expresses the dignity (importance) of the 
corresponding term in a document, a set of the document of M individual which has N different 
terms is expressed with the procession of MxN. A query is also expressed as a terminological 
vector. Search of a document is due to calculation of the similarity corresponding to the cosine 
of query vectors and the vector of each document. INQUERY is among the systems known 
well [ other ]. About this, it is 3:327-332 of 1995 and Information Processing and Management, 
J. Refer to "Tree and tipster experiments with inquery" by Callan etc. 

[0014]A potential meaning index (LSI) is the technique depending on the conceptual index by 
matching like a dictionary statistically drawn instead of individual term search. About this, 
1990, Journal of the America Society of Information Science, "Indexing by latent semantic 
analysis" by R. Harshman of 41:391-407, etc., By and the proceedings of 1995 and the 1995 
ACM Conference on Supercomputing. M. Please refer to "Computational Method for Intelligent 
Information Access" by W. Berry etc. It assumes that LSI has some structures which are not in 
sight, i.e., a potential structure, in directions of a word, and the structure needs to be exterior- 
ized by analyzing generating of the word in a document. Therefore, a document is considered 
as a vector in the term space of the very big range, and each element of the vector expresses 
the occurrence frequency of the specific term in the given document. The standard based on 
the whole and local weighting refined more is also used, and it gets. The shortened singular 
value decomposition (SVD) evaluates the structure of word use covering a document. Please 
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refer to the 2nd edition of "Matrix Computations" by G. Golub of Johns-Hopkins of American 
Maryland state Baltimore, etc. for this in 1989. Here, a search is performed using the database 
which has a singular value, and the vector acquired from shortened SVD. Let approach of this 
information retrieval be a standard coarser than the thing based on each term in the 
preliminary evaluation of LSI. 

[0015]The automated query extension has been suggested for a long time as a technique 
which deals with a word mismatch problem. They are proceedings of the 17 th Annual 
International ACM SIGIR Conference performed about this in Republic of Ireland Dublin in 
1994, E. Please refer to "Query Expansion Using Lexical-Semantic Relations" by Voorhees. In 
a certain approach, a possibility that a query is extended using a thesaurus and a word 
matches within a related document is improved. In research, it turns out that an improvement 
has a limit only by using an only common thesaurus. Much innovative technique is also 
proposed. The proceedings of 1994 and the 3 rd International Conferenceonlnformation and 
Knowledge Management, O. "Query Expansion Using Domain Adapted, Weighted Thesaurus 
in an Extended Boolean Model" by Kwon etc., The proceedings of the 16 th Annual 
International ACM SIGIR Conference performed in Pittsburgh, American Pennsylvania, in 
1993, E. "Concept Based Query Expansion" by Voorhees, "Query Expansion Using Lexical- 
Semantic Relations" by E. Voorhees of the proceedings, And please refer to "Computational 
Methods for Intelligent Information Access" by M. W. Berry of the proceedings, etc. By the 
automated query extension, the increase in efficiency of 25% of search is calculated from 7% 
on the average as a result of the experiment. Please refer to "Automatic Query Expansion 
Using SMART" by C. Buckley of the proceedings, etc. for this. 

[0016]lmprovement of a query is attained also by including a syntactically related word. This 
approach carries out clustering of the word based on the coincidence information within a 
document, and extends a query using these clusters. Since this coincidence information is 
binary relations, the size of such an index will become always very big. A certain group used 
the compilation of the coincidence statistics about modification of a word, and changed or 
generated SUTEMA (stemmer), and it was proved [ group ] which is advantageous compared 
with the approach only using a morphological rule. Please refer to "Corpus-Specific stemming 
Using Word Form Co-occurrence" besides W. B. Croft of the proceedings of the Fourth Annual 
Symposium for this in 1994. Each above-mentioned technique which extends the term of a 
query to 1 set of semantically related terms is called whole (global) analysis. In query 
extension, the term from relevance feedback is also added to a query, and the efficiency of 
search is improved. June, 1990, Journal of the American. Refer to "Improving retrieval 
performance by relevance feedback" besides 41 (4):288-297 of Society for Information 
Science, and G. Salton. This is called partial (local) analysis. By old research, by applying the 
whole analysis technique which used the context of a word and the structure of words and 
phrases to some groups of a document shows that search results more effective than simple 
local feedback and more positive are obtained. For details, refer to "Query Expansion Using 
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Local and Global Document Analysis" by J. Xu of the above-mentioned literature, etc. 
[0017]However, as mentioned above, it does not aim at that old research makes small size of 
the index used for solving the problem of efficient processing of a query when a query is 
extended, or performing query extension and query processing. 
[0018] 

[Problem(s) to be Solved by the lnvention]The purpose of this invention is to provide the 
method and device which perform efficient query extension using the index of small size, and 
process a continuous query, in order to solve the problem of a word mismatch, and inefficiency 
[ of the query processing produced as a result ]. It is semantically similar in more detail with the 
word specified in the query, the query is extended physically and notionally using the word 
which has relation syntactically, and it lessens missing a document related as a result. 
[0019]ln order to support extension of a query, the index of the word related about a definition 
and the word in syntactic cooccurrence relation needs to be maintained, and the following two 
problems become important about support of such query extension. The 1st is a problem of 
the size of an index table and the 2nd is a problem of the overhead of query processing. This 
invention also makes it the purpose to solve these problems. 
[0020] 

[Means for Solving the Problem]According to this invention, a concept and treatment structure 
of information which consist of two or more degrees of fragmentation are used in order to 
support extension of a query. This invention contains an index attachment phase, a query 
processing phase, and a rank phase. In an index attachment phase, grouping of the 
semantically similar word is carried out as one concept, and one actual index size becomes 
small as a result for a semantic concept subdivided in this way more coarsely. A word between 
query processings and in a query uses the contents of a dictionary and actual data, it is 
mapped by a corresponding semantic concept and syntactic extension, and logical extension 
is performed to the original query as a result. An overhead about processing is avoided. Next, 
a word of the first query is used for ranking a document obtained as search results based on 
strict matching, semantic matching, and syntactic matching, and is used also for performing 
processing of a continuous query. 
[0021] 

[Embodiment of the lnvention]The method for extending a query efficiently and the suitable 
embodiment of a device which are depended on this invention are described in detail below 
with an accompanying drawing. Although the following explanation is made about the PERICO 
object oriented database managerial system (OODBMS) of NEC, it should be cautious of this 
invention not being what is restricted to this. This invention is applied to the aggregate of 
various database systems and a document, and it deals in it. 

[0022]This invention provides effective index attachment and processing support about 
extension of a query by introducing the concept of two or more degrees of fragmentation. The 
approach of this invention sets up an index about the word which is semantically similar after 
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stemming (stemming) of a word using an available technique, and a syntactically related word. 
About said technique, the proceedings of the 19 th Annuallnternational ACM SIGIR 
Conference in 1996, Switzerland, and Zurich, J. "Query ExpansionUsing Local and Global 
Document Analysis" besides Xu, And the proceedings of the 20 th Annual International ACM 
SIGIR Conference in 1997 and Philadelphia, American Pennsylvania, C. Refer to "Guessing 
Morphology from Terms and Corpora" of Jacquemin. The approach of this invention makes 
size of an index small by being the degree of fragmentation of a high level more, and merging 
some entries (tuple) into one entry. The tuple with the information on the degree of 
fragmentation of a higher level is used for searching a related document between query 
processings. Then, the original word of a query is the finer degree of fragmentation, and since 
the document obtained as a result between query processings based on strict matching, 
semantically similar matching, and syntactically related matching is ranked, it is used. 
Maintaining the accuracy of the whole in a search mechanism by using the index and query 
processing technique which have two or more degrees of fragmentation, size of an index can 
be made small and quicker query processing can be realized. 
[0023]lt explains that first it is adapted with the notation of two or more degrees of 
fragmentation in relation to the conventional index attachment currently used by almost all IR 
system how. Next, the estimate about the overhead over the memory location in the case of 
performing index attachment which has two or more degrees of fragmentation about a set of a 
predetermined document is performed. 

[0024]ln order to search a given word easily from a document list, the conventional IR system 
holds an index and extracts the group of the word simultaneously related with the obtained 
document. In this case, the term of a "document" should be cautious of relating to the 
combination of a text, an image, or a text and an image. 

[0025] Drawing 2 shows the example of the index. The table shown in (b) of drawin g 2js the 
index which transposed the table shown in (a) of drawing 2 . At drawing 2, in order to explain 
easily, these indexes are shown in the form of the table. However, in a actual environment, the 
class of the upper layer of PERCIO OODBMS of NEC is used, for example. If the example of 
one query is taken, a user will use a word "car (passenger car)" and "dealer (store)" first, and if 
a query is created, IR system will take out a document list from the line to which the table of (b) 
of drawing 2 corresponds. In this case, the answer of a query serves as an intersection of the 
document list obtained from two lines. The approach against this IR is what supports only strict 
matching clearly, A related document including the term which has similar meanings, such as 
"automobile dealer (store of a car)", "car showroom (showroom of a passenger car)", or 
"automobile showroom (showroom of a car)", cannot be obtained. Query extension is used 
from the description [ query] "car" and "dealer" in relation to the special utility extended to the 
description ("car" or "automobile") and ("dealer" or "showroom"). Although this approach is 
realizable, a remarkable overhead will be invited to query processing. The lookup of several 
times is needed about each of a word similar as especially semantically instead of 2 times of 
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the lookups about the index table of (b) of draw[ng^as the word in the original query. A 
thesaurus tool like an on-line dictionary is required to extend the term of a query to them and a 
semantically similar term. From these observation, when this invention searches a set of a 
document, it provides the more effective method of supporting extension of a query. 
[0026]As stated previously, in order to avoid the mismatch of user's vocabulary and the 
author's vocabulary, the extension of a query based on the method of extending a query using 
the word to which a meaning is similar, and the word which has syntactic relations is needed. 
[0027] Drawing 3 shows the data structure for which an addition is needed making extension of 
a query easy in the conventional IR system. Especially drawing 3 shows the table drawn from 
the on-line dictionary by which grouping is carried out to the concept to which each word is 
semantically similar, and including a definition. The table shown in draw ing 3 is simplified for 
explanation. For example, the group "car (passenger car)" of a similar term, "auto (car)", 
"automobile (car)", and "sedan (sedan)" are expressed as one symbolic entity and semi. 
Unlike the semantic resemblance based on a dictionary or a thesaurus, the syntactic relation to 
IR is determined by the collection of a document itself. Especially the coincidence information 
on a word is used for associating two words syntactically. Drawin g 3 (b) has illustrated the 
index showing this information. With the auxiliary index of drawing 3 , a fundamental query 
extension technique is supported in IR system by using the conventional IR index of drawing 2 . 
Fundamentally, if a user's query is given, the word list of a query will be extended so that a 
semantically similar word and a syntactically related word may be included. 
[0028]Although an above-mentioned method is used for processing of the query using 
extension of the query, in this approach, the overhead about processing will become high. 
According to this invention, the index structure of the addition which can process a query more 
efficiently is used. The fundamental way of thinking of the approach of this invention changes 
the index of drawing 2 and drawing 3 so that a query may be extended notionally. Namely, the 
list of the word of a query is not physically extended by including a semantically similar word 
and a syntactically related word in a list, A query is notionally extended for the word of a query 
by [ the / related ] changing for the semantic concept of an upper level, and a syntactic-related 
(for example, cooccurrence relation) word more. This brings about addition of the capacity 
overhead by an additional index structure. However, since a user's query is processed more 
efficiently, saving can be attained as the whole. 

[0029]As mentioned above, in order to process the extended query, an index table is changed 
as shown in drawing 4 . Especially the index table shown in (a) of drawing 4 is drawn from (a) 
of drawing 2 by transposing each word (it is not a peculiar name) to the word of the semantic 
concept of an upper level more. The index table shown in (b) of drawing 4 is obtained in the 
word shown in (b) of drawing 2 by [ to which they correspond ] combining with the word of the 
semantic concept of an upper level more, and merging the entry of each document list. 
Therefore, the line entry corresponding to "car", "auto", "automobile", and "sedan" is expressed 
with (b) of drawing 4 as single entry Semi. Similarly, the line corresponding to "dealer", 
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"showroom", and "SalesOffice" of drawing 2 (b) is summarized to one line of a label called 
Sem2. 

[0030]The index to a syntactically related word is usually quite larger than the index to a word 
semantically related from several reasons. Many words on a web are peculiar names, and is 
not found in a dictionary. In the experiment, when the document of 2,904 was analyzed, 42% 
of keywords were found by WordNet. WordNet is an on-line dictionary which has 60,000 or 
more words. About this, 1990 and International Journal of Lexicography 3 (4), It is based on G. 
A. Miller of the page 245-264. "Nouns in WordNet : Please refer to A Lexical Inheritance 
System." 58% of word of the remainder includes the proper noun and the type error, and this 
has become the origin which hypertrophies the size of an index. In the conventional IR system, 
syntactic correlation is usually grasped by cooccurrence relation. Since the cooccurrence 
relation of the word within the same document is 1 to 1 relation, when n words is identified, the 
size of an index is set to (nx (n-1)) / 2 in the worst case. Carrying out index attachment of the 
cooccurrence relation of three or more words for the overhead of a huge memory location and 
index attachment requires cost dramatically. 

[0031 ]The word (that which is semantically meaningful) found in the dictionary is set to S, and 
other words (proper noun) of all the are set to P. The cooccurrence relation between words is 
classified into three different categories based on the above-mentioned classification of the 
word in a dictionary, and the word which is not in a dictionary. 

[0032]- P-P type: (name of Toyota cars), for example, (Toyota (Toyota), Avalon) (Acura 
(Aqura), Legend (name of the Aqura vehicle)) (Nissan (Nissan), Maxima (name of Nissan 
cars)), 

[0033]- S-P type or P-S type:, for example, (Buick (name of Ford cars), car (passenger car)), 
Buick, dealer (store), (car, Ford) (Ford, auto (car)) (Ford, dealer) (Ford Co.), 
[0034]- S-S type:, for example, (car, garage (garage)) (auto, garage), 

[0035]Usually, it is difficult to change the P-P type entry which is not convertible for the coarser 
degree of fragmentation shown in (b) of drawing 3 . However, other entries of all the have S 
word which can be replaced by the corresponding semantic concept of a higher level. The size 
of a coincidence index decreases and speedup of query processing is realized by this. 
Reduction of the size of an index is produced as follows. To each S-P type (w., X) entry, all the 
entry of (w., X) shown in (b) of drawi ng 3 is replaced by (Sem., X) of (c) so that w. may 
correspond to semantic concept Sem.. [ of drawing 4 ] Here, the list of corresponding 

documents is also merged. The same procedure is applied also to a P-S type entry. As shown 
in (c) of dj^wjngjL an entry (Ford, car), and (Ford, auto) are replaced by (Ford, Semi). 
Similarly, an entry (Ford, dealer), and (Ford, showroom) are replaced by (Ford, Sem2). Such a 
merge mechanism is explained using (a) of drawing 5 , and (b). 
[0036]A S-S type entry is merged by the following two methods. 

[0037]- Single merge : merge of the type of many [ one pair ] / many pairs 1 as shown in (a) of 
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drawing 5 , and (b). For example, an entry (car, dealer), (automobile, dealer), and (auto, dealer) 
are replaced by (Semi, dealer). The algorithm used here is the same as what is used with a S- 
P type and a P-S type. 

[0038]- Compound merge : merge of a many to many type as shown in (c) of drawing^. For 
example, an entry (car, dealer), (automobile, showroom), and (auto, SalesOffice) are replaced 
by (Semi , Sem2). The algorithm of this type of merge is as follows. 
[0039] 1 . to each S-S type entry (w., X), all the entries [ of drawing 3 ] of (b) of (w., X) are 

shown in (c) of dra win g 4 so that w. may correspond to semantic concept Sem. as (Sem., X) 

it replaces. 

[0040]To each entry of the type of 2. (Sem., w.), w replaces (Sem., w ) of such all by (Sem., 
Sem.) so that it may correspond to semantic concept Sem.. 

[0041 ]The above-mentioned step 2 should be cautious of the ability to also perform before the 
above-mentioned step 1. It is carried out repeatedly and deals in Step 1 and Step 2 of this 
algorithm until what is merged is lost. 

[0042]lf two or more entries are merged, the syntactic word list of each entry will also be 
merged by the merger (UNION) operation according to it. 

[0043]lndex attachment technique which has two or more degrees of fragmentation is mounted 
in the upper layer of OODBMS, and it deals in it. In such mounting, the table shown in (a) of 
drawing 2, (a) of drawing 3 , and (c) of drawing 4 is a class which has the contents. Other 
tables are classes which have only a pointer. Updating to an index, deletion, and inserting 
operation are performed by OODBMS via the program which transmits between automatic- 
checking-and-continuous-monitoring maintenance or a class. Maintenance of an index which 
has two or more degrees of fragmentation is performed cumulatively, and reorganization is not 
needed. 

[0044]Next, the estimate of the example by this invention is calculated besides the index based 
on the conventional word, taking into consideration the overhead of a memory location added 
since it is required to support the index table based on a semantic concept. As mentioned 
above, the table shown in drawing 4 is introduced for efficient query processing. First, 
calculation about the estimate of the memory location about the index used by the 
conventional IR system, i.e., the table shown in drawing 2 , is performed. The number of the 
documents in a predetermined aggregate presupposes that it is D. The numbers of words 
(number after removing a stop word and a grouping word using word stemming) in a dictionary 
in the aggregate of the predetermined document are W, and set to V the number of the words 
which is not in a dictionary. The average of the numbers of words in the dictionary for every 
document is set to w, the average of the numbers of words which are not in a dictionary is set 
to v, and the average of the document number for every word is set to d. The size of an index 
is calculated based on the number of entries (namely, the number of lines), and the size 
(namely, the number of pointers) of the whole table. Each entry of the table should be cautious 
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of being expressed as pointer data. When these parameters are given, the size of the table 

shown in (a) of drawing 2 is expressed with the following formulas (2). 

[0045] 

The number of lines [2 (a)] =D ... (1) 
Whole size [2 (a)] =(1+v+w) D ... (2). 

[0046]Although the word where one pointer is needed for discernment of a document and 
which is not in a dictionary in the list of a word in each line is expressed, Although v pointers 
are needed on an average and also the word which is in a dictionary in the list of a word is 
expressed, since w pointers are needed on an average, it should be cautious of the paragraph 
of (1+v+w) having arisen. Similarly, the size of the table shown in (b) of drawing 2 is expressed 
with the following formulas (4). 
[0047] 

The number of lines [2 (b)] =W+V ... (3) 
Whole size [2 (b)] = (1+d) and (W+V) ... (4). 

[0048]Each line of this table is an average and needs d pointers which serve as an identifier of 
a document within a document list, and one pointer which points out the word itself. 
[0049]Next, the memory location overhead of an on-line dictionary and a syntactic coincidence 
table required to support fundamental query extension is estimated. It is considered as the 
compression element obtained by carrying out grouping of the word which is in a dictionary 
about f to a semantic concept. Therefore, f can be regarded as a number of a word of 
averages by which grouping was carried out to one concept. The size of the table shown in (a) 
of drawing 3 can be expressed like the following formulas (6). 
[0050] 

The number of lines [3 (a)] =W/f ... (5) 
Whole size [3 (a)] =W+W/f ... (6). 

[0051]Since the memory location of the word in a dictionary is compressed based on the 
compression element f, a formula (5) is expressed in this way. It is shown that a formula (6) 
needs W pointers to express the word in the list of a word, and it is required for the pointer of a 
W/f individual to express a semantic identifier. The size of the table shown by (b) of draw ing 3 
is expressed with the following formulas (8) when the worst. 
[0052] 

The number of lines [3 (b)] =V(V-1) /2+VW+W(W-1) 12 ... (7) 
Whole size [3 (b)] =(1+2+q) - (V(V-1) /2+VW+W(W-1) 12) ... (8). 

[0053]ln a formula (7), the 1st paragraph corresponds to the cooccurrence relation of a P-P 
type word, the 2nd paragraph corresponds to a S-P type or a P-S type, and the last paragraph 
corresponds to S-S type cooccurrence relation, q expresses the mean number of the entry in a 
document list for every paragraph showing cooccurrence relation. Since a syntactic term 
identifier is expressed, three pointers are needed, and two words is included in the 
cooccurrence relation of each line. 
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[0054]Next, it estimates about the memory location overhead about index attachment which 
has two or more degrees of fragmentation based on this invention which carries out grouping 
of the group of a semantically similar term to one unique semantic concept. As mentioned 
above, in order to calculate the size of the index table shown in d rawing 4 , it is necessary to 
estimate the average document number for every semantic concept, and the mean number of 
the semantic concept for every document. It can be shown that the mean number of the 
document for every semantic concept becomes larger than d, and this extension does not 
become f-d since two or more terms are cut down by one semantic concept. On the other 
hand, the mean number of the concept for every document does not exceed w. It can be 
shown that this number actually becomes a thing similar to w. Based on these parameters, the 
memory location overhead of an addition of index attachment which has two or more degrees 
of fragmentation is calculable. The calculation about the table shown in (a) of d rawing 4 is as 
follows. 
[0055] 

The number of lines [4 (a)] =D ... (9) 
Whole size [4 (a)] =(1+v+w) D ... (10). 

[0056]That is, size is the same as the table shown in (a) of drawing 2 . On the other hand, the 

size of the table shown in (b) of drawing 4 is as follows. 

[0057] 

The number of lines [4 (b)] =W/f ... (11) 
Whole size [4 (b)] = (1+df) and W/f ... (12). 

[0058]Since a word is unified by the semantic concept, the number of the entries of the word in 
a dictionary decreases based on the element f. However, only the almost same part as the 
element increases the number of the documents for every semantic concept. As a result, the 
size of this table becomes the same thing as the table shown in (b) of drawing 2 . In the degree 
of fragmentation of a higher level, the table indicated to be (a) of drawing 4 to (b) should be 
cautious of it being a table indicated to be (a) of drawing 2 to (b), respectively. Finally, the 
estimate of the memory location of the table shown in (c) of drawing 4 is calculated as shown 
in the following formulas (13) and (14). 
[0059] 

The number of lines [3 (b)] =V(V-1) /2+V, and(W/f)+ (W(W-1) /2f 2 ) ... (13) 
Whole size [3 (b)] =(1+2+q) -V(V-1) /2+ (1+2+qf) 

- Vand(W/f)+ (1+2+qf) (W(W-1) /2f 2 ) ... (14) 

[0060]Fundamentally, S-S types, S-P types, or all the P-S type coincidence terms are 
compressed based on the element f, and it becomes small capacity substantially compared 
with the table shown in (b) of draw ing 3. 

[0061]Eventually, according to this invention, it is needed except the table shown in (b) of 
drawing 3 . On the other hand, a fundamental query extension technique needs all the tables 
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shown in dr awing 2 and dra wing 3. Therefore, although only the increment of the memory 
location of the table showing the cost about the memory location at the time of adopting the 
method of this invention in (b) with (a) of drawing 4 becomes large, since the size of the table 
shown in (c) of drawing 4 becomes small, it compensates for a part for said increase in cost 
selectively. It depends for the exact numerical value of saving on the value of various 
parameters mentioned above. Even when the worst, an additional memory location is quite 
smaller than the twice of the memory location at the time of using a fundamental query 
extension technique. 

[0062]The above-mentioned index attachment technique assumes having only a meaning with 
a single word, and it argues about it.However, a word usually has two or more meanings. For 
example, the word of "bank" is interpreted as a financial institution (bank) or a riverside. In 
order to take into consideration about the word which has two or more meanings, the word 
(shown by drawing 3 ) of a semantic word list shall belong to two or more conceptual numbers 
shown in (a) of drawing 4 . For example, "bank" shall be related with Sem10 and Sem20. In 
order to perform extension of a query in consideration of two or more such meanings, when a 
query includes one word belonging to several different conceptual numbers, each of the 
different conceptual number should be taken into consideration in the case of processing of a 
query. 

[0063]ln the above-mentioned explanation, index technique is mounted in the upper layer of 
OODBMS of NEC, and the word in a semantic word list is related with the conceptual number 
by the pointer. Redundant data is not memorized but its cost of the memory location about a 
pointer is very low. WordNet provides the synonym to a certain word by the interpretation of 
various meanings, and ranks it according to the frequency where the meaning is used. For 
example, more "bank(s)" is interpreted as a financial institution rather than a riverside. The 
most general semantic interpretation is used for the present execution. However, a data 
structure is also extensible. 

[0064]The grouping of a meaning except having been stated above can be taken into 
consideration. Only extension of the query by a synonym is taken into consideration in drawing 
4. Relaxation of meanings of other molds, such as ISA and IS_PART_OF, can also be taken 
into consideration. Two or more tables of the gestalt shown in (a) of draw ing 4 are generable 
about the grouping (one related with IS_PART_OF one about ISA for example) of various 
meanings. One table can also be used about the grouping of various meanings. When 
extending a query by both the synonym and a hypernym, a lookup is performed to two or more 
tables. 

[0065]ln order to cope with the problem of a word mismatch, the query processing technique 
needs to extend the word of a query using a related word. As a result, by the relevance over 
the word of the original query, the additional task which ranks a document is performed and it 
gets. Next, processing of the extended query is provided as three tasks by this invention, i.e., 
extension of a query, processing of a query, and a rank of a result. 
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[0066]First, extension of a query is explained. Dra wing 6 shows the example of extension of 
the query under the conventional query extension technique. The query of "searching a 
document including the word of car and dealer" is corrected, and the word relevant to car and 
dealer is added. A semantically similar related word and the related word which has syntactic 
cooccurrence relation are determined using the table shown in drawing 3. The example of the 
query extension by the origin of the query extension technique which has two or more degrees 
of fragmentation depended on this invention is shown in drawing 7. The extended technique of 
the query which has two or more degrees of fragmentation changes the word of car and dealer 
into the concepts Semi and Sem2 using the table shown in (a) of drawing 3. Using the table 
showing a word in (c) of drawing 4 after [ corresponding to the word ] changing into the 
semantic concept of an upper level more, the semantic concept is extended so that syntactic 
relations may be included, and the proper noun in the original query is extended so that the 
related word from a coincidence table may be included. 

[0067]When the query Q including both the word in a dictionary and the word which is not in a 

dictionary is given, Q is expressed by the following formulas (15). 

[0068] 

Q=(s 1 **...**s m )**(p 1 **...**p n )...(15) 

s. expresses the word in a dictionary with a formula (15), and p. expresses with it the word 

which is not in a dictionary. There are n words which does not have a word in a dictionary in 
those with m piece and a dictionary in the query Q. If such a query is given, query extension 
technique which has two or more degrees of fragmentation will be performed as follows. 
[0069]1. every in Q - it corresponds to the s. obtained from the table showing s. (i= 1, m) at 

(a) of drawing 3 -- replace by the semantic concept of an upper level more. C. [ each of the 

concept replaced in this way ] is written. 

[0070]2. every obtained at Step 1 -- extend Q by searching for and adding the word which has 
syntactic relation to C. (i= 1, m) using the table shown in (c) of drawing 4 . A S-S type entry 

contributes to the addition of a concept, and a S-P type entry contributes to a proper noun. 
[0071 ]3. Extend Q by searching for and adding the word to produce and which has syntactic 
relation using the table shown in (c) of drawing 4 with each p. (j= 1 n). A P-S type entry 

contributes to the addition of a concept, and a P-P type entry contributes to a proper noun. 
[0072]4. Remove the word or the concept of a redundant query from Q. 
[0073]Compared with the query extended by the conventional technique, the query extended 
by this invention is compacter and there are few items which should be checked. That is 
because the word of the query is changed into the entity in the coarser degree of 
fragmentation. As a result, the cost of the query processing of the query extended by this 
invention will become still smaller. Next, the number of the entities (a word or concept) 
introduced in the query extension which has two or more degrees of fragmentation of query 
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extension of a Prior art and this invention estimates. As mentioned above, the mean number of 
the word in a dictionary by which grouping was carried out more under the semantic concept of 
an upper level is expressed with f. Here, the mean number of the proper noun which has 
syntactic relation which was semantically related with a word and which set the mean number 
of the concept of an upper level to g more, and was related with a word is set to h. Then, the 
number of the words in Q under extension (BQ) of a fundamental query is the expansive sum 
total by which it is generated at Steps 1 , 2, and 3, as shown in the following formulas (16). 
[0074] 

Numbers-of-words [BQ] =(mf)+m(g+h)+n (g+h) .. . (16) 

Here, since each of m words in a dictionary is replaced by f semantically similar words, the 1st 
paragraph is produced. Since the individual (g+h) addition of the coincidence word in a 
dictionary and the coincidence word which is not in a dictionary is carried out to each of m 
words in a dictionary, the 2nd paragraph is produced. The 3rd paragraph corresponds to each 
of n proper nouns which added the coincidence word of the individual (g+h). Similarly, the 
number of the word in Q under the query extension (MGQ) which has two or more degrees of 
fragmentation, and concepts is expressed with the following formulas (17). 
[0075] 

Numbers of words [MGQ] =m+m(g/f+h) +n (g/f+h) ... (17) 

Since a semantic expression of the upper level is here used more about the group of the 
similar word currently used, that the compression element f appears is a point which is greatly 
different substantially. Therefore, the number of the word/concepts which are included in a 
query has decreased in the meaning stricter than what is depended on a fundamental query 
extension technique by the query extension technique which has two or more degrees of 
fragmentation. In the table of (c) of dr awing 4, if the number of the proper nouns for every word 
is small, the complexity of the query by the technique of this invention will be reduced based 
on the element f. 

[0076]Shortly, query processing is explained. In the query processing based on the 
conventional strict matching, shortly after it turns out that the conditions about the predicate of 
the search relevant to a query are not fulfilled, retrieval processing will be ended. Since search 
is due to similarity, in actual IR, that is not right. Especially a user is going to look at the result 
by which that it is also partial matched the user's search criteria. Therefore, to the query which 
has N words, N times of lookups are required and this is not dependent on the Boolean 
conditions in the predicate of search. Since partial matching is supported, it is necessary to 
add rank processing after query processing. Rank technique needs the information about the 
frequency within which word of the documents matches a query, and the document of the 
word. 

[0077]Here, in two techniques, it analyzes about the cost of the lookup at the time of 
processing a query. There is a fundamental difference in a cleanup cost for two factors. These 
two factors are shown below. 
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[0078]- There are more numbers of words in fundamental query extension than the numbers of 
words in the query extension which has two or more degrees of fragmentation. 
[0079]- Differ by the technique to which a lookup is performed and whose number of entries of 
each table is two. 

[0080]Here, the lookup cost of the query Q mentioned above is estimated, the table is 
systematized with a balanced search structure and lookup operation of a table responds to the 
number of lines of a table - logarithm -- it is assumed that it changes-like. Therefore, the 
lookup cost at the time of performing Q in fundamental query extension using the above- 
mentioned estimated type becomes as shown in the following formulas (18) and (19). 
[0081] 

Lookup cost. (Q, BQ) = mf i og (number of lines [2 (b)]+(m+n), (g+h), and log (the number of lines 
[3 (b)]) ... [ (18) = mf-iog(W+V)+(m+n), (g+h), and log (V(V-1) /2+VW+W(W-1) 12) ] ... (19)) 
[0082]The lookup cost at the time of similarly performing Q in the query extension which has 
two or more degrees of fragmentation becomes as shown in the following formulas (20) and 
(21). 
[0083] 

Lookup cost (Q, MGQ) =. m-i og (the number of lines [4 (b)] + (m+n) (g/f+h)-log (the number of 
lines [4(c)])) ...(21). 

[0084]Since the size of two tables where the number of times of the lookup of the word in a 
dictionary decreases by the element f in MGQ, and is the execution target of a lookup 
becomes small, it is clear that the cost of the query processing in MGQ becomes smaller than 
the cost in BQ. 

[0085]Next, how to rank this invention is explained. In a query processing stage, expression of 
the word in the coarser degree of fragmentation is used for removing an unrelated document. 
However, since they fulfill two conditions, i.e., the conditions that "car" and "dealer" are 
included in a coarser fragmentation degree level, the document which serves as a candidate 
has the same rank. This is not preferred as a result of query processing. Therefore, in the 
stage of a rank, the word of the origin in the document which serves as a candidate is 
accessed, and it is used for a rank. 

[0086]The document which has a keyword which fulfills the following conditions and which 

becomes four candidates is shown by drawing 8 . 

[0087]Conditions: (Semi ** Ford ** Buick) ** (Sem2 ** Ford** BUICK). 

[0088]The first matching keyword is searched for a rank. Therefore, ("car", "dealer"), ("auto", 

"dealer"), ("auto", "sales office"), and ("Ford", "showroom") are used for ranking the grade of 

relevance. 

[0089]The document which serves as a candidate is ranked based on the grade of the 
relaxation about the word which matched in the document which has a word in a query. 
[0090]For example, the grade of relaxation is defined in order of E<Se<Sy<X (that is, with no 
strict matching < semantic relaxation < syntactic relaxation < matching). Here, the result of the 
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query using the word which relaxation was made on the higher level will contain that more 
nearly unrelated to a user about the word of a query. However, the order of the grade of 
relaxation and a definition are arbitrary by the requirements for application. The rank of the 
document which serves as a candidate becomes higher, so that smaller relaxation is used, 
although the document which serves as a candidate is looked for. The highest rank is given to 
the document which has a word of "car" and "dealer" in the lower part of drawing 8 . This is 
because the candidate's word matched the word of the query strictly. The rank with a 
document expensive to the 2nd which has a word of "auto" and "dealer" is given. This is 
because semantic relaxation (that is, the term of a query is replaced with a semantically 
related term) is needed only for one word in order to make the word "car" of a query match. 
About other ranks, as shown in drawing 8 , it is carried out. 
[0091]Rank technique is performed based on the following two standards. 
[0092]- The relation between keyword Wordl in Q and document Doc1, Word2 in Doc2, Word3 
in Doc3, and Word4 in Doc4 about the keyword of the given query Q, respectively, When you 
have strict matching, matching by semantic query relaxation, matching by syntactic query 
relaxation, and no matching, a document is ranked in order of Doc1>Doc2>Doc3>Doc4. 
[0093]-. Correspond to M documents, Doc. (i= 1, M), and document Doc., respectively. The 

rank (score) about the number of keywords and Match. (i= 1 , M) which match a query, 

Match^Match^Matchg ... When it is Match M1 >Match M , it is Doc^Doc^Doc^.. It becomes 

Doc,, >Doc_ , n . 

M-1 MO 

[0094] If based on the rank technique which uses the query provided with two keywords and 
which was mentioned above, the two-dimensional rank graph in the case of searching a 
document with the query which has two words as shown in drawing 9 will be generated. If a 
query is not extended, only the document within a slot (E, E) will be searched, if both semantic 
extension of a query and syntactic extension are used, unless a document will be [ slot (X, X) ] 
alike, all the related documents are searched. 

[0095]This rank graph is expressed as a procession. A rank graph is expressed by the 
procession of Nx4, and M (i, j) (i= 0 ... N, j= 0...3) about the query which has N terms. For 
example, the rank graph of drawing 9 is expressed as the procession M (i, j) (i= 0...2, j= 0...3). 
For example, a slot (E, E), (Se, E), (Se, Sy), and (X, X) are expressed within the procession as 
a slot (3, 3), (2, 3), (2, 1), and (0, 0), respectively. According to this expression, each document 
can be ranked easily as follows. 

[0096]- To the document within a slot (n, m), when m is from 0 to 3, the rank of these 
documents becomes a score higher than the document within a slot (i, j) (i= 0 ... n, j= 0...3). 
[0097]- The score of the rank of the document within a slot (n1, ml) becomes more than the 
score of the document within a slot (n2, m2), when it is n1>=n2 and m1>=m2. 
[0098]Expression of this rank graph is realized by the commercial visualization tool. For 
example, the visualization method called Cone Trees is changed by adding the depth about a 
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three-dimensional rank expression, and it deals in it. For details, April, 1993, Communications 
of the ACM, Vol. 36, No. 4, and page 57-71, G. Refer to "Information Visualization Using 3D 
Interactive Animation" by G. Robertson etc. 

[0099]lf based on this rank technique, the result within the slot of the upper part of drawing 9 is 
ranked by a score higher than the result in the lower part. However, it is difficult to rank the 
result of the slot which belongs to the same class in drawing 9 . Drawing 10 shows how such a 
rank is performed. The slot shown as a result is further classified into a class, and it is made to 
have the rank whose slot of the same class is the same there. 

[0100]Query processing by this invention is continuously performed for every class using the 
class structure shown in drawing 10 . A user publishes a query with two keywords and the case 
where it is required that top 50 results should be searched is considered. If drawin g 10 is 
referred to, a query processor may generate search results in the class 0 first. When there are 
more search results than 50, the query processor can end processing, without performing a 
query extension task. When the number of the search results in the class 0 is less than 50, the 
query processor can generate the result in the class 1 (for example, a slot (2, 3), and (3, 2)). 
When there are more totals of search results (for example, it can set in the class 0 and the 
class 1) than 50, a query processor ends processing, without carrying out query processing 
further. The query processor should be cautious of a slot (2, 3) and (3, 2) an inner result being 
continuously generable. That is, the query processor can generate the result of a slot (2, 3) 
first. When the total of search results exceeds 50, the query processor can end processing, 
without generating a result in a slot (3, 2). A query processor can continue generation of the 
further result from the remaining slot and class as mentioned above until the total of search 
results exceeds 50, or until it reaches the last class. 

[0101]When it is changed by the user noting that one keyword is more important for the above- 
mentioned example than other keywords, an order that a query processor searches the slot of 
search results is corrected according to the change. For example, when a user specifies it that 
the keyword 1 is more important than the keyword 2, an order of the horizontal query 
processing in a class is drawn as shown in drawing 1 1 . That is, in this example, a query 
processor generates search results into a slot (3, 2) first. Next, when the total of search results 
is less than 50, a query processor generates a result into a slot (2, 3) after that. 
[0102] Drawing 12 shows the physical configuration of the system by which this, invention is 
performed. Such a system contains the database 1206 which memorizes the aggregate of a 
document. This database contains the index 1208 for memorizing those relations to a concept 
(for example, semantic or syntactic concept) and the aggregate of a document. Further, a 
system generates the index 1208 and contains the indexer 1210 for generating the concept 
which has the degree of fragmentation of a higher rank more, and the index 1208 including 
those relations to the aggregate of a document. The processor 1204 is used for receiving the 
query specified by the user via the user interface 1202. Next, the processor 1204 processes a 
query and performs a rank function. It ranks with the result of a query and a function is again 
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displayed on a user via the user interface 1202. 

[0103]The person skilled in the art can understand that operation of this invention is not what is 
restricted to the example illustrated by drawing 12 . The person skilled in the art can actually 
acquire the same effect using other alternative hardware environment, without deviating from 
the range of this invention. For example, it performs by an element with separate **** and 
various functions (for example, it ranks with query processing and a function is performed by 
another component), or is performed by the single element (for example, a single processor 
performs index attachment, query processing, and a rank function). 

[0104]ln short, this invention has held the original validity (precision and recall) of the group of 
the keyword about the inputted document, the dictionary including a definition, and the query. 
Index attachment (saving of an index area) covering two or more degrees of fragmentation and 
a new technique for supporting extension of a query using query processing (saving of 
processing time) are provided effectively. 

[0105]Since a query is simplified by the index attachment technique and query processing 
technique covering two or more degrees of fragmentation depended on this invention, the size 
of the index which shows the relation of a word becomes smaller, and the processing time of a 
query becomes short by them. Since the rank technique of this invention is based on a certain 
word from the beginning in a document, consistency is maintained at the result of a rank. 
[01 06]lt is clear from the indication so far and instruction that a person skilled in the art can 
make other various change and corrections to this invention. Therefore, although this 
specification has described only some examples of this invention, various change can be 
considered to this invention, without deviating from the intention and range of this invention. 
[0107] 

[Effect of the InventionJAccording to this invention, in order to solve the problem of a word 
mismatch, and inefficiency [ of the query processing produced as a result ], the index of small 
size is used and efficient query extension is performed. It can specifically be semantically 
similar with the word specified in the query, the query can be extended physically and 
notionally using the word which has relation syntactically, and it can lessen missing a related 
document as a result. 
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LThis document has been translated by computer. So the translation may not reflect the 
original precisely. 

2.**** shows the word which can not be translated. 
3.ln the drawings, any words are not translated. 



[Field of the lnvention]This invention relates to the field of the index generally applied to 
collecting the documents in a database, and a query. It is related with reduction of the size of 
the index used for carrying out effective extension of a query, and processing and extension of 
a query in more detail, and processing of a continuous query. 



[Translation done.] 
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1 This document has been translated by computer. So the translation may not reflect the 
original precisely. 

2.**** shows the word which can not be translated. 
3.ln the drawings, any words are not translated. 



[Description of the Prior Art]The conventional search system which searches a document is 
based on the common principle and methodology which classify a document by applying a 
query. A document is specified a priori by an expert or the librarian, and index attachment is 
usually manually done using the adjusted term. Index attachment of the document may be 
carried out again based on the word (word) included in the document. A user connects 
between them with the word chosen from the term which can be specified with a suitable 
Boolean operator, and searches a document. A strict matching strategy is used in a such type 
system. Although this approach has many advantages of being simple and highly precise, the 
problem of a word mismatch produces it. 

[0003]The author is the document, and the problem of the word mismatch in information 
retrieval is produced by using another word, when a certain word is being used and a user 
specifies the same concept as it as it in a query, although a certain concept is expressed. 
Dr awing 1 shows that the words used in the document of HyperText Markup Language 
(HTML) related with "car (passenger car)" and "dealer (store)" may differ among various 
documents. Languages other than HTML like an extensible markup language (XML) and 
Standard Generalized Markup Language (SGML) are also used. When a user uses the word of 
"automobile (car)" and "dealer (store)" for a query, a result with which one cannot search the 
target document on the problem of a word mismatch is brought. 

[0004]ln this specification, since the object of search assumes that English is mainly contained, 
each element of the query used for search is described in English. However, these can also be 
expressed in the language of which country according to a user's demand. Here, the meaning 
in Japanese of the element will be expressed in a parenthesis (accepting necessity) following 
the element described in said English. Therefore, Japanese in the parenthesis concerned is for 
only explaining the meaning of the element of a query, and does not affect the result of a 
query. 

[0005]Extension of the query is suggested as a technique which solves such a problem. The 
word (for example, word for which it has a related meaning of a synonym or others) to which 

http://www4.ipdl.inpit.go.jp/cgi-bin/tran_web_cgi_ejje?atw3=http%3A%2F%2Fwww4.ipd... 3/9/2010 



JP,2000-137738,A [PRIOR ART] 



Page 2 of 5 



the meaning was [ this approach ] similar, and a syntactically related word. A query is 
extended by using (for example, the word group which appears simultaneously in the same 
document above fixed frequency is a syntactic coincidence word) as a word in a query. In this 
way, extension of a query will increase a possibility of matching the word in a related 
document. Use of extension of a query will extend a query including the word of "car dealer 
(store of a passenger car)" so that the term of the meaning same as follows may be included. 
[0006]Line 1. [("car(passenger car)" OR"automobile (car)" OR"auto (car)" OR "sedan (sedan)") 
OR line 2.] ("Ford(Ford car)" OR "Buick (BYUIKKU vehicle)") AND line 3. ("dealer(store)" 
OR"Showroom (showroom)" OR "SalesOffice (sales store)"). 

[0007]There are two types of extension of the query contained in the above-mentioned 
example. Extension of the query of the line 1 and the line 3 adds the additional word relevant 
to "car" and "dealer" in a definition. That is, a semantically similar word is added, "automobile", 
"auto", and "sedan" are words which has a meaning similar to the word of "car." Similarly, 
"Showroom" and "SalesOffice" are words which has a meaning similar to the word of "dealer." 
Extension of the query of other types is shown in the line 2. 
This is based on syntactic cooccurrence relation. 

Many words used with World Wide Web (it is also only called a web) are proper nouns 
actually. 

It is not found in a term dictionary. 

For example, a proper noun is called Ford, Buick, NBA, and NFL (National Football League). 
As mentioned above, syntactic cooccurrence relation is drawn by analyzing the frequency 
where two words appears simultaneously in the same document. This is based on assumption 
that a possibility that those words are related is high, when two words appears in the same 
document frequently, as the word generated with "Ford" - "dealer (store)" "body shop (a body 
factory", "Mustang (Mustang: name of the car by Ford Co.)", "Escort (escort: name of the car 
by Ford Co.)", etc. can be considered.) 

[0008]ln order to support extension of a query, the index of the word associated by the 
definition and a syntactic relation like coincidence information must be maintained 
appropriately. The index related with a word by the definition is constituted as a hierarchy 
cluster of a layered structure, a semantic network, or a related word. . About said layered 
structure, were carried out in Athens, Greece, in August, 1997. the 23rd International. 
Conference on VeryLarge. The page 538-547 of the proceedings of Data Bases, and W. 
Please refer to "Facilitating Multimedia Database Exploration through Visual Interfaces and 
Perpetual Query Reformulations" besides Li. About said semantic network, 1990 and 
International Journal of Lexicography 3 (4), G. A. Miller in the page 245-264 "Nouns in 
WordNet : Refer to A Lexical Inheritance System." About the hierarchy cluster of a related 
word. Refer to "The SMART and SIRE Experimental Retrieval Systems" by G. Salton of New 
York, McGraw-Hill, and the page 118-155, etc. in 1983. Since a syntactic relation like syntactic 
cooccurrence relation is expressed with binary relations, the size of a syntactic-related index is 
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dramatically large. Some techniques are proposed in order to solve this problem. The 
proceedings of the Fifteenth annual International ACM SIGIR Conference [ in / techniques / 
such / 1992 and Denmark ], G. "Use of syntactic context to produce term association lists for 
text retrieval" by Grefenstette, The proceedings of the 19 th Annual International ACM 
SIGIRConference in Zurich in 1996 and Switzerland, J. "Query Expantion Using Local and 
Global Document Analysis" by Xu etc., . It can set to Philadelphia, American Pennsylvania, in 
1997. Refer to "Guessing Morphology from Terms and Corpora" by C. Jacquemin of the 
proceedings of the20 th Annual International ACM SIGIR Conference. Such a technique 
includes analysis of occurrence frequency, and use of a morphological rule (for example, all 
the words are changed into the gestalt used as the origin), or a term dictionary. 
[0009]About the problem of a word mismatch, remarkable research has been done in the field 
of information retrieval (IR). About this, 1983 and McGraw-Hill BookCompany issue, G. 
"Introduction to Modern Information Retrieval" by Salton etc., 1989, Addison-Wesley 
Publishing Company, and Inc issue, G. It is based on Salton. "Automatic Text Processing : The 
Transformation, Analysis, and Retrieval of Information by Computer", and 1997, Refer to 
"Readings in Information Retrieval" by K. Sparck Jones of San Francisco, American California, 
and Morgan Kaufmann, etc. 

[0010]However, these the researches of most point to points about the standard of search, 
such as precision and recall. How to support extension of a query effectively (in 1993, the 
proceedings of the 3 rd Text Retrieval Conference in State Gaithersburg of Maryland) C. 
Although there are some researches which suggested the mechanism of refer to "Automatic 
Query Expansion Using SMART" or index attachment by Buckley etc., two problems without 
the solution to satisfy still remain. The 1st problem is a proper noun with many separate words 
in a set (for example, web) of a certain document. 

Since many each words have the same word and the syntactically related word semantically, 
the size of an index is becoming very large. 

Since a query is extended by the additional word, the 2nd problem is that the cleanup cost of a 
query becomes high. 

[001 1]Since the number of documents increases dramatically, and the word currently used is 
very various, and it is inconsistent and is occasionally wrong when dealing with the document 
information collected from the web (for example, type error), these problems become 
increasingly remarkable. In a certain research, almost all the user query about a web usually 
has two words. About this, they are proceedings of 1995 and Digital Libraries (DL'95), B. Croft 
etc. "Providing Government Information on the Internet : Please refer to Experienceswith 
THOMAS." However, if query extension is used, the length of a query will become long 
substantially. As a result, most existing search engines on a web can provide query expanded 
function. 

[0012]Here, the existing research in the field of query extension is outlined. Query extension 
attracted remarkable attention in the field of IR. However, the portion which has attracted 
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attention until now was evaluating the grade of the standard (namely, precision and recall) of 
the search improved by extension of a query. Another research has focused on building a 
dictionary, in order to identify 1 set of similar terms about the word of the given query. 
However, old research is not tackling the point of making small the problem of efficient 
processing of a query when a query is extended, and size of the index used for performing 
extension and processing of a query. The problem which ranks a document based on strict 
matching and resemblance matching is left behind as a difficult thing. 
[0013]SMART is one of the advanced information retrieval systems known well. About this, 
1971, American New Jersey Englewood. The SMART Retrieval System -Experiments in 
Automatic Document Processing of the Gerard Salton edit published from Prentice-Hall of 
Cliffs, "Experiments with a fast algorithm for automatic classification" by R. T. Dattola of 
Chapter 12, And refer to "The SMART and SIRE Experimental Retrieval Systems" by G. 
Salton of the above-mentioned literature, etc. Each document is expressed in SMART by the 
terminological vector. Each position of the vector expresses the dignity (importance) of the 
corresponding term in a document, a set of the document of M individual which has N different 
terms is expressed with the procession of MxN. A query is also expressed as a terminological 
vector. Search of a document is due to calculation of the similarity corresponding to the cosine 
of query vectors and the vector of each document. INQUERY is among the systems known 
well [ other ]. About this, it is 3:327-332 of 1995 and Information Processing and Management, 
J. Refer to "Tree and tipster experiments with inquery" by Callan etc. 

[0014]A potential meaning index (LSI) is the technique depending on the conceptual index by 
matching like a dictionary statistically drawn instead of individual term search. About this, 
1990, Journal of the America Society of Information Science, "Indexing by latent semantic 
analysis" by R. Harshman of 41:391-407, etc., By and the proceedings of 1995 and the 1995 
ACM Conference on Supercomputing. M. Please refer to "Computational Method for Intelligent 
Information Access" by W. Berry etc. It assumes that LSI has some structures which are not in 
sight, i.e., a potential structure, in directions of a word, and the structure needs to be exterior- 
ized by analyzing generating of the word in a document. Therefore, a document is considered 
as a vector in the term space of the very big range, and each element of the vector expresses 
the occurrence frequency of the specific term in the given document. The standard based on 
the whole and local weighting refined more is also used, and it gets. The shortened singular 
value decomposition (SVD) evaluates the structure of word use covering a document. Please 
refer to the 2nd edition of "Matrix Computations" by G. Golub of Johns-Hopkins of American 
Maryland state Baltimore, etc. for this in 1989. Here, a search is performed using the database 
which has a singular value, and the vector acquired from shortened SVD. Let approach of this 
information retrieval be a standard coarser than the thing based on each term in the 
preliminary evaluation of LSI. 

[0015]The automated query extension has been suggested for a long time as a technique 
which deals with a word mismatch problem. They are proceedings of the 17 th Annual 
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International ACM SIGIR Conference performed about this in Republic of Ireland Dublin in 
1994, E. Please refer to "Query Expansion Using Lexical-Semantic Relations" by Voorhees. In 
a certain approach, a possibility that a query is extended using a thesaurus and a word 
matches within a related document is improved. In research, it turns out that an improvement 
has a limit only by using an only common thesaurus. Much innovative technique is also 
proposed. The proceedings of 1994 and the 3 rd International Conferenceon Information and 
Knowledge Management, O. "Query Expansion Using Domain Adapted, Weighted Thesaurus 
in an Extended Boolean Model" by Kwon etc., The proceedings of the 16 th Annual 
International ACM SIGIR Conference performed in Pittsburgh, American Pennsylvania, in 
1993, E. "Concept Based Query Expansion" by Voorhees, "Query Expansion Using Lexical- 
Semantic Relations" by E. Voorhees of the proceedings, And please refer to "Computational 
Methods for Intelligent Information Access" by M. W. Berry of the proceedings, etc. By the 
automated query extension, the increase in efficiency of 25% of search is calculated from 7% 
on the average as a result of the experiment. Please refer to "Automatic Query Expansion 
Using SMART" by C. Buckley of the proceedings, etc. for this. 

[0016]lmprovement of a query is attained also by including a syntactically related word. This 
approach carries out clustering of the word based on the coincidence information within a 
document, and extends a query using these clusters. Since this coincidence information is 
binary relations, the size of such an index will become always very big. A certain group used 
the compilation of the coincidence statistics about modification of a word, and changed or 
generated SUTEMA (stemmer), and it was proved [ group ] which is advantageous compared 
with the approach only using a morphological rule. Please refer to "Corpus-Specific stemming 
Using Word Form Co-occurrence" besides W. B. Croft of the proceedings of the Fourth Annual 
Symposium for this in 1994. Each above-mentioned technique which extends the term of a 
query to 1 set of semantically related terms is called whole (global) analysis. In query 
extension, the term from relevance feedback is also added to a query, and the efficiency of 
search is improved. June, 1990, Journal of the American. Refer to "Improving retrieval 
performance by relevance feedback" besides 41(4):288-297 of Society for Information 
Science, and G. Salton. This is called partial (local) analysis. By old research, by applying the 
whole analysis technique which used the context of a word and the structure of words and 
phrases to some groups of a document shows that search results more effective than simple 
local feedback and more positive are obtained. For details, refer to "Query Expansion Using 
Local and Global Document Analysis" by J. Xu of the above-mentioned literature, etc. 
[0017]However, as mentioned above, it does not aim at that old research makes small size of 
the index used for solving the problem of efficient processing of a query when a query is 
extended, or performing query extension and query processing. 



[Translation done.] 

http://ww4.ipdl.inpit.go.jp/cgi-b^ 3/9/2010 



JP,2000-137738,A [EFFECT OF THE INVENTION] 
* NOTICES * 



Page 1 of 1 



LThis document has been translated by computer. So the translation may not reflect the 
original precisely. 

2.**** shows the word which can not be translated. 
3.ln the drawings, any words are not translated. 



[Effect of the lnvention]According to this invention, in order to solve the problem of a word 
mismatch, and inefficiency [ of the query processing produced as a result ], the index of small 
size is used and efficient query extension is performed. It can specifically be semantically 
similar with the word specified in the query, the query can be extended physically and 
notionally using the word which has relation syntactically, and it can lessen missing a related 
document as a result. 



[Translation done.] 
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1. This document has been translated by computer. So the translation may not reflect the 
original precisely. 

2. **** shows the word which can not be translated. 
3.ln the drawings, any words are not translated. 



[Problem(s) to be Solved by the lnvention]The purpose of this invention is to provide the 
method and device which perform efficient query extension using the index of small size, and 
process a continuous query, in order to solve the problem of a word mismatch, and inefficiency 
[ of the query processing produced as a result ]. It is semantically similar in more detail with the 
word specified in the query, the query is extended physically and notionally using the word 
which has relation syntactically, and it lessens missing a document related as a result. 
[0019]ln order to support extension of a query, the index of the word related about a definition 
and the word in syntactic cooccurrence relation needs to be maintained, and the following two 
problems become important about support of such query extension. The 1st is a problem of 
the size of an index table and the 2nd is a problem of the overhead of query processing. This 
invention also makes it the purpose to solve these problems. 



[Translation done.] 
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LThis document has been translated by computer. So the translation may not reflect the 
original precisely. 

2.**** shows the word which can not be translated. 
3.ln the drawings, any words are not translated. 



[Means for Solving the Problem]According to this invention, a concept and treatment structure 
of information which consist of two or more degrees of fragmentation are used in order to 
support extension of a query. This invention contains an index attachment phase, a query 
processing phase, and a rank phase. In an index attachment phase, grouping of the 
semantically similar word is carried out as one concept, and one actual index size becomes 
small as a result for a semantic concept subdivided in this way more coarsely. A word between 
query processings and in a query uses the contents of a dictionary and actual data, it is 
mapped by a corresponding semantic concept and syntactic extension, and logical extension 
is performed to the original query as a result. An overhead about processing is avoided. Next, 
a word of the first query is used for ranking a document obtained as search results based on 
strict matching, semantic matching, and syntactic matching, and is used also for performing 
processing of a continuous query. 
[0021] 

[Embodiment of the lnvention]The method for extending a query efficiently and the suitable 
embodiment of a device which are depended on this invention are described in detail below 
with an accompanying drawing. Although the following explanation is made about the PERICO 
object oriented database managerial system (OODBMS) of NEC, it should be cautious of this 
invention not being what is restricted to this. This invention is applied to the aggregate of 
various database systems and a document, and it deals in it. 

[0022]This invention provides effective index attachment and processing support about 
extension of a query by introducing the concept of two or more degrees of fragmentation. The 
approach of this invention sets up an index about the word which is semantically similar after 
stemming (stemming) of a word using an available technique, and a syntactically related word. 
About said technique, the proceedings of the 19 th Annuallnternational ACM SIGIR 
Conference in 1996, Switzerland, and Zurich, J. "Query ExpansionUsing Local and Global 
Document Analysis" besides Xu, And the proceedings of the 20 th Annual International ACM 
SIGIR Conference in 1997 and Philadelphia, American Pennsylvania, C. Refer to "Guessing 
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Morphology from Terms and Corpora" of Jacquemin. The approach of this invention makes 
size of an index small by being the degree of fragmentation of a high level more, and merging 
some entries (tuple) into one entry. The tuple with the information on the degree of 
fragmentation of a higher level is used for searching a related document between query 
processings. Then, the original word of a query is the finer degree of fragmentation, and since 
the document obtained as a result between query processings based on strict matching, 
semantically similar matching, and syntactically related matching is ranked, it is used. 
Maintaining the accuracy of the whole in a search mechanism by using the index and query 
processing technique which have two or more degrees of fragmentation, size of an index can 
be made small and quicker query processing can be realized. 
[0023]lt explains that first it is adapted with the notation of two or more degrees of 
fragmentation in relation to the conventional index attachment currently used by almost all IR 
system how. Next, the estimate about the overhead over the memory location in the case of 
performing index attachment which has two or more degrees of fragmentation about a set of a 
predetermined document is performed. 

[0024]ln order to search a given word easily from a document list, the conventional IR system 
holds an index and extracts the group of the word simultaneously related with the obtained 
document. In this case, the term of a "document" should be cautious of relating to the 
combination of a text, an image, or a text and an image. 

[00253 Drawing 2 shows the example of the index. The table shown in (b) of drawing 2 is the 
index which transposed the table shown in (a) of drawing 2 . At drawing 2, in order to explain 
easily, these indexes are shown in the form of the table. However, in a actual environment, the 
class of the upper layer of PERCIO OODBMS of NEC is used, for example. If the example of 
one query is taken, a user will use a word "car (passenger car)" and "dealer (store)" first, and if 
a query is created, IR system will take out a document list from the line to which the table of (b) 
of drawi ng 2 corresponds. In this case, the answer of a query serves as an intersection of the 
document list obtained from two lines. The approach against this IR is what supports only strict 
matching clearly, A related document including the term which has similar meanings, such as 
"automobile dealer (store of a car)", "car showroom (showroom of a passenger car)", or 
"automobile showroom (showroom of a car)", cannot be obtained. Query extension is used 
from the description [ query ] "car" and "dealer" in relation to the special utility extended to the 
description ("car" or "automobile") and ("dealer" or "showroom"). Although this approach is 
realizable, a remarkable overhead will be invited to query processing. The lookup of several 
times is needed about each of a word similar as especially semantically instead of 2 times of 
the lookups about the index table of (b) of drawing 2 as the word in the original query. A 
thesaurus tool like an on-line dictionary is required to extend the term of a query to them and a 
semantically similar term. From these observation, when this invention searches a set of a 
document, it provides the more effective method of supporting extension of a query. 
[0026]As stated previously, in order to avoid the mismatch of user's vocabulary and the 
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authors vocabulary, the extension of a query based on the method of extending a query using 
the word to which a meaning is similar, and the word which has syntactic relations is needed. 
[0027] Drawing 3 shows the data structure for which an addition is needed making extension of 
a query easy in the conventional IR system. Especially drawing 3 shows the table drawn from 
the on-line dictionary by which grouping is carried out to the concept to which each word is 
semantically similar, and including a definition. The table shown in drawing 3 is simplified for 
explanation. For example, the group "car (passenger car)" of a similar term, "auto (car)", 
"automobile (car)", and "sedan (sedan)" are expressed as one symbolic entity and semi . 
Unlike the semantic resemblance based on a dictionary or a thesaurus, the syntactic relation to 
IR is determined by the collection of a document itself. Especially the coincidence information 
on a word is used for associating two words syntactically. Dra wing 3 (b) has illustrated the 
index showing this information. With the auxiliary index of drawing 3, a fundamental query 
extension technique is supported in IR system by using the conventional IR index of d rawing 2 . 
Fundamentally, if a user's query is given, the word list of a query will be extended so that a 
semantically similar word and a syntactically related word may be included. 
[0028]Although an above-mentioned method is used for processing of the query using 
extension of the query, in this approach, the overhead about processing will become high. 
According to this invention, the index structure of the addition which can process a query more 
efficiently is used. The fundamental way of thinking of the approach of this invention changes 
the index of drawing 2 and draw ing 3 so that a query may be extended notionally. Namely, the 
list of the word of a query is not physically extended by including a semantically similar word 
and a syntactically related word in a list, A query is notionally extended for the word of a query 
by [ the / related ] changing for the semantic concept of an upper level, and a syntactic-related 
(for example, cooccurrence relation) word more. This brings about addition of the capacity 
overhead by an additional index structure. However, since a user's query is processed more 
efficiently, saving can be attained as the whole. 

[0029]As mentioned above, in order to process the extended query, an index table is changed 
as shown in drawing 4. Especially the index table shown in (a) of drawing 4 is drawn from (a) 
of drawing 2 by transposing each word (it is not a peculiar name) to the word of the semantic 
concept of an upper level more. The index table shown in (b) of drawing 4 is obtained in the 
word shown in (b) of drawing 2 by [ to which they correspond ] combining with the word of the 
semantic concept of an upper level more, and merging the entry of each document list. 
Therefore, the line entry corresponding to "car", "auto", "automobile", and "sedan" is expressed 
with (b) of drawing 4 as single entry Semi. Similarly, the line corresponding to "dealer", 
"showroom", and "SalesOffice" of drawing 2 (b) is summarized to one line of a label called 
Sem2. 

[0030]The index to a syntactically related word is usually quite larger than the index to a word 
semantically related from several reasons. Many words on a web are peculiar names, and is 
not found in a dictionary. In the experiment, when the document of 2,904 was analyzed, 42% 
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of keywords were found by WordNet. WordNet is an on-line dictionary which has 60,000 or 
more words. About this, 1990 and International Journal of Lexicography 3 (4), It is based on G. 
A. Miller of the page 245-264. "Nouns in WordNet : Please refer to A Lexical Inheritance 
System." 58% of word of the remainder includes the proper noun and the type error, and this 
has become the origin which hypertrophies the size of an index. In the conventional IR system, 
syntactic correlation is usually grasped by cooccurrence relation. Since the cooccurrence 
relation of the word within the same document is 1 to 1 relation, when n words is identified, the 
size of an index is set to (nx (n-1)) / 2 in the worst case. Carrying out index attachment of the 
cooccurrence relation of three or more words for the overhead of a huge memory location and 
index attachment requires cost dramatically. 

[0031]The word (that which is semantically meaningful) found in the dictionary is set to S, and 
other words (proper noun) of all the are set to P. The cooccurrence relation between words is 
classified into three different categories based on the above-mentioned classification of the 
word in a dictionary, and the word which is not in a dictionary. 

[0032]- P-P type: (name of Toyota cars), for example, (Toyota (Toyota), Avalon) (Acura 
(Aqura), Legend (name of the Aqura vehicle)) (Nissan (Nissan), Maxima (name of Nissan 
cars)), 

[0033]- S-P type or P-S type:, for example, (Buick (name of Ford cars), car (passenger car)), 
Buick, dealer (store), (car, Ford) (Ford, auto (car)) (Ford, dealer) (Ford Co.), 
[0034]- S-S type:, for example, (car, garage (garage)) (auto, garage), 

[0035]Usually, it is difficult to change the P-P type entry which is not convertible for the coarser 
degree of fragmentation shown in (b) of drawing 3 . However, other entries of all the have S 
word which can be replaced by the corresponding semantic concept of a higher level. The size 
of a coincidence index decreases and speedup of query processing is realized by this. 
Reduction of the size of an index is produced as follows. To each S-P type (w., X) entry, all the 

entry of (w., X) shown in (b) of drawing 3 is replaced by (Sem., X) of (c) so that w may 

correspond to semantic concept Sem.. [ of drawing 4 ] Here, the list of corresponding 

documents is also merged. The same procedure is applied also to a P-S type entry. As shown 
in (c) of drawing 4 , an entry (Ford, car), and (Ford, auto) are replaced by (Ford, Semi). 
Similarly, an entry (Ford, dealer), and (Ford, showroom) are replaced by (Ford, Sem2). Such a 
merge mechanism is explained using (a) of drawing 5 , and (b). 
[0036]A S-S type entry is merged by the following two methods. 

[0037]- Single merge : merge of the type of many [ one pair ] / many pairs 1 as shown in (a) of 
drawing 5 , and (b). For example, an entry (car, dealer), (automobile, dealer), and (auto, dealer) 
are replaced by (Semi, dealer). The algorithm used here is the same as what is used with a S- 
P type and a P-S type. 

[0038]- Compound merge :. merge of a many to many type as shown in (c) of drawing 5. For 
example, an entry (car, dealer), (automobile, showroom), and (auto, SalesOffice) are replaced 
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by (Semi , Sem2). The algorithm of this type of merge is as follows. 

[0039]! « to each S-S type entry (w., X), all the entries [ of drawing 3 1 of (b) of (w., X) are 

shown in (c) of dr awing 4 so that w. may correspond to semantic concept Sem. - as (Sem., X) 

- it replaces. 

[0040]To each entry of the type of 2. (Sem., w.), w. replaces (Sem., w ) of such all by (Sem., 
Sem.) so that it may correspond to semantic concept Sem.. 

[0041]The above-mentioned step 2 should be cautious of the ability to also perform before the 
above-mentioned step 1. It is carried out repeatedly and deals in Step 1 and Step 2 of this 
algorithm until what is merged is lost. 

[0042]lf two or more entries are merged, the syntactic word list of each entry will also be 
merged by the merger (UNION) operation according to it. 

[0043]lndex attachment technique which has two or more degrees of fragmentation is mounted 
in the upper layer of OODBMS, and it deals in it. In such mounting, the table shown in (a) of 
drawing 2 , (a) of drawing 3 , and (c) of drawing 4 is a class which has the contents. Other 
tables are classes which have only a pointer. Updating to an index, deletion, and inserting 
operation are performed by OODBMS via the program which transmits between automatic- 
checking-and-continuous-monitoring maintenance or a class. Maintenance of an index which 
has two or more degrees of fragmentation is performed cumulatively, and reorganization is not 
needed. 

[0044]Next, the estimate of the example by this invention is calculated besides the index based 
on the conventional word, taking into consideration the overhead of a memory location added 
since it is required to support the index table based on a semantic concept. As mentioned 
above, the table shown in drawing 4 is introduced for efficient query processing. First, 
calculation about the estimate of the memory location about the index used by the 
conventional IR system, i.e., the table shown in dra wing 2 , is performed. The number of the 
documents in a predetermined aggregate presupposes that it is D. The numbers of words 
(number after removing a stop word and a grouping word using word stemming) in a dictionary 
in the aggregate of the predetermined document are W, and set to V the number of the words 
which is not in a dictionary. The average of the numbers of words in the dictionary for every 
document is set to w, the average of the numbers of words which are not in a dictionary is set 
to v, and the average of the document number for every word is set to d. The size of an index 
is calculated based on the number of entries (namely, the number of lines), and the size 
(namely, the number of pointers) of the whole table. Each entry of the table should be cautious 
of being expressed as pointer data. When these parameters are given, the size of the table 
shown in (a) of drawing 2 is expressed with the following formulas (2). 
[0045] 

The number of lines [2 (a)] =D ... (1) 
Whole size [2 (a)] =(1+v+w) D ... (2). 
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[0046]Although the word where one pointer is needed for discernment of a document and 
which is not in a dictionary in the list of a word in each line is expressed, Although v pointers 
are needed on an average and also the word which is in a dictionary in the list of a word is 
expressed, since w pointers are needed on an average, it should be cautious of the paragraph 
of (1+v+w) having arisen. Similarly, the size of the table shown in (b) of drawing 2 is expressed 
with the following formulas (4). 
[0047] 

The number of lines [2 (b)] =W+V ... (3) 
Whole size [2 (b)] = (1+d) and (W+V) ... (4). 

[0048]Each line of this table is an average and needs d pointers which serve as an identifier of 
a document within a document list, and one pointer which points out the word itself. 
[0049]Next, the memory location overhead of an on-line dictionary and a syntactic coincidence 
table required to support fundamental query extension is estimated. It is considered as the 
compression element obtained by carrying out grouping of the word which is in a dictionary 
about f to a semantic concept. Therefore, f can be regarded as a number of a word of 
averages by which grouping was carried out to one concept. The size of the table shown in (a) 
of drawing 3 can be expressed like the following formulas (6). 
[0050] 

The number of lines [3 (a)] =W/f ... (5) 
Whole size [3 (a)] =W+W/f ... (6). 

[0051]Since the memory location of the word in a dictionary is compressed based on the 
compression element f, a formula (5) is expressed in this way. It is shown that a formula (6) 
needs W pointers to express the word in the list of a word, and it is required for the pointer of a 
W/f individual to express a semantic identifier. The size of the table shown by (b) of drawing 3 
is expressed with the following formulas (8) when the worst. 
[0052] 

The number of lines [3 (b)] =V(V-1) /2+VW+W(W-1) 12 ... (7) 
Whole size [3 (b)] =(1+2+q) - (V(V-1) /2+VW+W(W-1) 12) ... (8). 

[0053]ln a formula (7), the 1st paragraph corresponds to the cooccurrence relation of a P-P 
type word, the 2nd paragraph corresponds to a S-P type or a P-S type, and the last paragraph 
corresponds to S-S type cooccurrence relation, q expresses the mean number of the entry in a 
document list for every paragraph showing cooccurrence relation. Since a syntactic term 
identifier is expressed, three pointers are needed, and two words is included in the 
cooccurrence relation of each line. 

[0054]Next, it estimates about the memory location overhead about index attachment which 
has two or more degrees of fragmentation based on this invention which carries out grouping 
of the group of a semantically similar term to one unique semantic concept. As mentioned 
above, in order to calculate the size of the index table shown in drawing 4, it is necessary to 
estimate the average document number for every semantic concept, and the mean number of 
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the semantic concept for every document. It can be shown that the mean number of the 
document for every semantic concept becomes larger than d, and this extension does not 
become f-d since two or more terms are cut down by one semantic concept. On the other 
hand, the mean number of the concept for every document does not exceed w. It can be 
shown that this number actually becomes a thing similar to w. Based on these parameters, the 
memory location overhead of an addition of index attachment which has two or more degrees 
of fragmentation is calculable. The calculation about the table shown in (a) of d rawing 4 is as 
follows. 
[0055] 

The number of lines [4 (a)] =D ... (9) 
Whole size [4 (a)] =(1+v+w) D ... (10). 

[0056]That is, size is the same as the table shown in (a) of drawing 2. On the other hand, the 

size of the table shown in (b) of drawing 4 is as follows. 

[0057] 

The number of lines [4 (b)] =W/f ... (11) 
Whole size [4 (b)] = (1+df) and W/f ... (12). 

[0058]Since a word is unified by the semantic concept, the number of the entries of the word in 
a dictionary decreases based on the element f. However, only the almost same part as the 
element increases the number of the documents for every semantic concept. As a result, the 
size of this table becomes the same thing as the table shown in (b) of drawing 2 . In the degree 
of fragmentation of a higher level, the table indicated to be (a) of drawing 4 to (b) should be 
cautious of it being a table indicated to be (a) of drawing 2 to (b), respectively. Finally, the 
estimate of the memory location of the table shown in (c) of drawing 4 is calculated as shown 
in the following formulas (13) and (14). 
[0059] 

The number of lines [3 (b)] =V(V-1) /2+V, and(W/f)+ (W(W-1) /2f 2 ) ... (13) 
Whole size [3 (b)] =(1+2+q) -V(V-1) 12+ (1+2+qf) 

- Vand(W/f)+ (1+2+qf) (W(W-1) /2f 2 ) ... (14) 

[0060]Fundamentally, S-S types, S-P types, or all the P-S type coincidence terms are 
compressed based on the element f, and it becomes small capacity substantially compared 
with the table shown in (b) of drawing 3 . 

[0061]Eventually, according to this invention, it is needed except the table shown in (b) of 
drawing 3. On the other hand, a fundamental query extension technique needs all the tables 
shown in drawing 2 and drawin g 3. Therefore, although only the increment of the memory 
location of the table showing the cost about the memory location at the time of adopting the 
method of this invention in (b) with (a) of dr awing 4 becomes large, since the size of the table 
shown in (c) of drawin g 4 becomes small, it compensates for a part for said increase in cost 
selectively. It depends for the exact numerical value of saving on the value of various 
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parameters mentioned above. Even when the worst, an additional memory location is quite 
smaller than the twice of the memory location at the time of using a fundamental query 
extension technique. 

[0062]The above-mentioned index attachment technique assumes having only a meaning with 
a single word, and it argues about it.However, a word usually has two or more meanings. For 
example, the word of "bank" is interpreted as a financial institution (bank) or a riverside. In 
order to take into consideration about the word which has two or more meanings, the word 
(shown by d rawing 3) of a semantic word list shall belong to two or more conceptual numbers 
shown in (a) of drawi ng 4. For example, "bank" shall be related with Sem10 and Sem20. In 
order to perform extension of a query in consideration of two or more such meanings, when a 
query includes one word belonging to several different conceptual numbers, each of the 
different conceptual number should be taken into consideration in the case of processing of a 
query. 

[0063]ln the above-mentioned explanation, index technique is mounted in the upper layer of 
OODBMS of NEC, and the word in a semantic word list is related with the conceptual number 
by the pointer. Redundant data is not memorized but its cost of the memory location about a 
pointer is very low. WordNet provides the synonym to a certain word by the interpretation of 
various meanings, and ranks it according to the frequency where the meaning is used. For 
example, more "bank(s)" is interpreted as a financial institution rather than a riverside. The 
most general semantic interpretation is used for the present execution. However, a data 
structure is also extensible. 

[0064]The grouping of a meaning except having been stated above can be taken into 
consideration. Only extension of the query by a synonym is taken into consideration in drawing 
4. Relaxation of meanings of other molds, such as ISA and IS_PART_OF, can also be taken 
into consideration. Two or more tables of the gestalt shown in (a) of drawing 4 are generable 
about the grouping (one related with IS_PART_OF one about ISA for example) of various 
meanings. One table can also be used about the grouping of various meanings. When 
extending a query by both the synonym and a hypernym, a lookup is performed to two or more 
tables. 

[G065]ln order to cope with the problem of a word mismatch, the query processing technique 
needs to extend the word of a query using a related word. As a result, by the relevance over 
the word of the original query, the additional task which ranks a document is performed and it 
gets. Next, processing of the extended query is provided as three tasks by this invention, i.e., 
extension of a query, processing of a query, and a rank of a result. 

[0066]First, extension of a query is explained. Drawing 6 shows the example of extension of 
the query under the conventional query extension technique. The query of "searching a 
document including the word of car and dealer" is corrected, and the word relevant to car and 
dealer is added. A semantically similar related word and the related word which has syntactic 
cooccurrence relation are determined using the table shown in drawing 3 . The example of the 

http://ww4.ipdl.inpit.go.jp/cgi-b^ 3/9/2010 



JP,2000-137738,A [MEANS] 



Page 9 of 14 



query extension by the origin of the query extension technique which has two or more degrees 
of fragmentation depended on this invention is shown in drawing 7. The extended technique of 
the query which has two or more degrees of fragmentation changes the word of car and dealer 
into the concepts Semi and Sem2 using the table shown in (a) of drawing 3. Using the table 
showing a word in (c) of drawin g 4 after [ corresponding to the word ] changing into the 
semantic concept of an upper level more, the semantic concept is extended so that syntactic 
relations may be included, and the proper noun in the original query is extended so that the 
related word from a coincidence table may be included. 

[0067]When the query Q including both the word in a dictionary and the word which is not in a 

dictionary is given, Q is expressed by the following formulas (15). 

[0068] 

Q=(s/* ... **s ) ** (p/* ... **p ) ... (15) 

s. expresses the word in a dictionary with a formula (15), and p. expresses with it the word 

which is not in a dictionary. There are n words which does not have a word in a dictionary in 
those with m piece and a dictionary in the query Q. If such a query is given, query extension 
technique which has two or more degrees of fragmentation will be performed as follows. 
[0069] 1 . every in Q it corresponds to the s. obtained from the table showing s (i= 1 , m) at 

(a) of drawing 3 -- replace by the semantic concept of an upper level more. C [ each of the 

concept replaced in this way ] is written. 

[0070]2. every obtained at Step 1 - extend Q by searching for and adding the word which has 
syntactic relation to C (i= 1, m) using the table shown in (c) of drawing 4 . A S-S type entry 

contributes to the addition of a concept, and a S-P type entry contributes to a proper noun. 
[0071 ]3. Extend Q by searching for and adding the word to produce and which has syntactic 
relation using the table shown in (c) of drawing 4 with each p. (j= 1, n). A P-S type entry 

contributes to the addition of a concept, and a P-P type entry contributes to a proper noun. 
[0072J4. Remove the word or the concept of a redundant query from Q. 
[0073]Compared with the query extended by the conventional technique, the query extended 
by this invention is compacter and there are few items which should be checked. That is 
because the word of the query is changed into the entity in the coarser degree of 
fragmentation. As a result, the cost of the query processing of the query extended by this 
invention will become still smaller. Next, the number of the entities (a word or concept) 
introduced in the query extension which has two or more degrees of fragmentation of query 
extension of a Prior art and this invention estimates. As mentioned above, the mean number of 
the word in a dictionary by which grouping was carried out more under the semantic concept of 
an upper level is expressed with f. Here, the mean number of the proper noun which has 
syntactic relation which was semantically related with a word and which set the mean number 
of the concept of an upper level to g more, and was related with a word is set to h. Then, the 
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number of the words in Q under extension (BQ) of a fundamental query is the expansive sum 
total by which it is generated at Steps 1 , 2, and 3, as shown in the following formulas (16). 
[0074] 

Numbers-of-words [BQ] = (mf) +m(g+h) +n (g+h) ... (16) 

Here, since each of m words in a dictionary is replaced by f semantically similar words, the 1st 
paragraph is produced. Since the individual (g+h) addition of the coincidence word in a 
dictionary and the coincidence word which is not in a dictionary is carried out to each of m 
words in a dictionary, the 2nd paragraph is produced. The 3rd paragraph corresponds to each 
of n proper nouns which added the coincidence word of the individual (g+h). Similarly, the 
number of the word in Q under the query extension (MGQ) which has two or more degrees of 
fragmentation, and concepts is expressed with the following formulas (17). 
[0075] 

Numbers of words [MGQ] =m+m(g/f+h) +n (g/f+h) ... (17) 

Since a semantic expression of the upper level is here used more about the group of the 
similar word currently used, that the compression element f appears is a point which is greatly 
different substantially. Therefore, the number of the word/concepts which are included in a 
query has decreased in the meaning stricter than what is depended on a fundamental query 
extension technique by the query extension technique which has two or more degrees of 
fragmentation. In the table of (c) of drawing 4, if the number of the proper nouns for every word 
is small, the complexity of the query by the technique of this invention will be reduced based 
on the element f. 

[0076]Shortly, query processing is explained. In the query processing based on the 
conventional strict matching, shortly after it turns out that the conditions about the predicate of 
the search relevant to a query are not fulfilled, retrieval processing will be ended. Since search 
is due to similarity, in actual IR, that is not right. Especially a user is going to look at the result 
by which that it is also partial matched the user's search criteria. Therefore, to the query which 
has N words, N times of lookups are required and this is not dependent on the Boolean 
conditions in the predicate of search. Since partial matching is supported, it is necessary to 
add rank processing after query processing. Rank technique needs the information about the 
frequency within which word of the documents matches a query, and the document of the 
word. 

[0077]Here, in two techniques, it analyzes about the cost of the lookup at the time of 
processing a query. There is a fundamental difference in a cleanup cost for two factors. These 
two factors are shown below. 

[0078]- There are more numbers of words in fundamental query extension than the numbers of 
words in the query extension which has two or more degrees of fragmentation. 
[0079]- Differ by the technique to which a lookup is performed and whose number of entries of 
each table is two. 

[0080]Here, the lookup cost of the query Q mentioned above is estimated, the table is 
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systematized with a balanced search structure and lookup operation of a table responds to the 
number of lines of a table - logarithm - it is assumed that it changes-like. Therefore, the 
lookup cost at the time of performing Q in fundamental query extension using the above- 
mentioned estimated type becomes as shown in the following formulas (18) and (19). 
[0081] 

Lookup cost. (Q, BQ) = mf-iog (number of lines [2 (b)]+(m+n), (g+h), and log (the number of lines 
[3 (b)]) ... [ (18) = mf-iog(W+V)+(m+n), (g+h), and log (V(V-1) /2+VW+W(W-1) 12) ] ... (19)) 
[0082]The lookup cost at the time of similarly performing Q in the query extension which has 
two or more degrees of fragmentation becomes as shown in the following formulas (20) and 
(21). 
[0083] 

Lookup cost (Q, MGQ) =. m iog (the number of lines [4 (b)] + (m+n) (g/f+h)-log (the number of 
lines [4(c)])) ...(21). 

[0084]Since the size of two tables where the number of times of the lookup of the word in a 
dictionary decreases by the element f in MGQ, and is the execution target of a lookup 
becomes small, it is clear that the cost of the query processing in MGQ becomes smaller than 
the cost in BQ. 

[0085]Next, how to rank this invention is explained. In a query processing stage, expression of 
the word in the coarser degree of fragmentation is used for removing an unrelated document. 
However, since they fulfill two conditions, i.e., the conditions that "car" and "dealer" are 
included in a coarser fragmentation degree level, the document which serves as a candidate 
has the same rank. This is not preferred as a result of query processing. Therefore, in the 
stage of a rank, the word of the origin in the document which serves as a candidate is 
accessed, and it is used for a rank. 

[0086]The document which has a keyword which fulfills the following conditions and which 

becomes four candidates is shown by draw ing 8. 

[0087]Conditions: (Semi ** Ford ** Buick) ** (Sem2 ** Ford** BUICK). 

[0088]The first matching keyword is searched for a rank. Therefore, ("car", "dealer"), ("auto", 

"dealer"), ("auto", "sales office"), and ("Ford", "showroom") are used for ranking the grade of 

relevance. 

[0089]The document which serves as a candidate is ranked based on the grade of the 
relaxation about the word which matched in the document which has a word in a query. 
[0090]For example, the grade of relaxation is defined in order of E<Se<Sy<X (that is, with no 
strict matching < semantic relaxation < syntactic relaxation < matching). Here, the result of the 
query using the word which relaxation was made on the higher level will contain that more 
nearly unrelated to a user about the word of a query. However, the order of the grade of 
relaxation and a definition are arbitrary by the requirements for application. The rank of the 
document which serves as a candidate becomes higher, so that smaller relaxation is used, 
although the document which serves as a candidate is looked for. The highest rank is given to 
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the document which has a word of "car" and "dealer" in the lower part of dr awing 8. This is 
because the candidate's word matched the word of the query strictly. The rank with a 
document expensive to the 2nd which has a word of "auto" and "dealer" is given. This is 
because semantic relaxation (that is, the term of a query is replaced with a semantically 
related term) is needed only for one word in order to make the word "car" of a query match. 
About other ranks, as shown in drawing 8 , it is carried out. 
[0091]Rank technique is performed based on the following two standards. 
[0092]- The relation between keyword Wordl in Q and document Doc1, Word2 in Doc2, Word3 
in Doc3, and Word4 in Doc4 about the keyword of the given query Q, respectively, When you 
have strict matching, matching by semantic query relaxation, matching by syntactic query 
relaxation, and no matching, a document is ranked in order of Doc1>Doc2>Doc3>Doc4. 
[0093]-. Correspond to M documents, Doc. (i= 1, M), and document Doc., respectively. The 

rank (score) about the number of keywords and Match. (i= 1 , M) which match a query, 

Match 1 >Match 2 >Match 3 ... When it is Match M 1 >Match |V] , it is Doc 1 >Doc 2 >Doc 3 ... It becomes 

Doc,. >Doc,,_. 

M-1 MO 

[0094]lf based on the rank technique which uses the query provided with two keywords and 
which was mentioned above, the two-dimensional rank graph in the case of searching a 
document with the query which has two words as shown in drawing 9 will be generated. If a 
query is not extended, only the document within a slot (E, E) will be searched, if both semantic 
extension of a query and syntactic extension are used, unless a document will be [ slot (X, X) ] 
alike, all the related documents are searched. 

[0095]This rank graph is expressed as a procession. A rank graph is expressed by the 
procession of Nx4, and M (i, j) (i= 0 ... N, j= 0...3) about the query which has N terms. For 
example, the rank graph of drawing 9 is expressed as the procession M (i, j) (i= 0...2, j= 0...3). 
For example, a slot (E, E), (Se, E), (Se, Sy), and (X, X) are expressed within the procession as 
a slot (3, 3), (2, 3), (2, 1), and (0, 0), respectively. According to this expression, each document 
can be ranked easily as follows. 

[0096]- To the document within a slot (n, m), when m is from 0 to 3, the rank of these 
documents becomes a score higher than the document within a slot (i, j) (i= 0 ... n, j= 0...3). 
[0097]- The score of the rank of the document within a slot (n1, ml) becomes more than the 
score of the document within a slot (n2, m2), when it is n1>=n2 and m1>=m2. 
[0098]Expression of this rank graph is realized by the commercial visualization tool. For 
example, the visualization method called Cone Trees is changed by adding the depth about a 
three-dimensional rank expression, and it deals in it. For details, April, 1993, Communications 
of the ACM, Vol. 36, No. 4, and page 57-71 , G. Refer to "Information Visualization Using 3D 
Interactive Animation" by G. Robertson etc. 

[0099]lf based on this rank technique, the result within the slot of the upper part of draw ing 9 is 
ranked by a score higher than the result in the lower part. However, it is difficult to rank the 
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result of the slot which belongs to the same class in drawing 9 . Draw ing 10 shows how such a 
rank is performed. The slot shown as a result is further classified into a class, and it is made to 
have the rank whose slot of the same class is the same there. 

[0100]Query processing by this invention is continuously performed for every class using the 
class structure shown in drawing 10 . A user publishes a query with two keywords and the case 
where it is required that top 50 results should be searched is considered. If drawing 10 is 
referred to, a query processor may generate search results in the class 0 first. When there are 
more search results than 50, the query processor can end processing, without performing a 
query extension task. When the number of the search results in the class 0 is less than 50, the 
query processor can generate the result in the class 1 (for example, a slot (2, 3), and (3, 2)). 
When there are more totals of search results (for example, it can set in the class 0 and the 
class 1) than 50, a query processor ends processing, without carrying out query processing 
further. The query processor should be cautious of a slot (2, 3) and (3, 2) an inner result being 
continuously generable. That is, the query processor can generate the result of a slot (2, 3) 
first. When the total of search results exceeds 50, the query processor can end processing, 
without generating a result in a slot (3, 2). A query processor can continue generation of the 
further result from the remaining slot and class as mentioned above until the total of search 
results exceeds 50, or until it reaches the last class. 

[0101]When it is changed by the user noting that one keyword is more important for the above- 
mentioned example than other keywords, an order that a query processor searches the slot of 
search results is corrected according to the change. For example, when a user specifies it that 
the keyword 1 is more important than the keyword 2, an order of the horizontal query 
processing in a class is drawn as shown in drawing 1 1 . That is, in this example, a query 
processor generates search results into a slot (3, 2) first. Next, when the total of search results 
is less than 50, a query processor generates a result into a slot (2, 3) after that. 
[0102] Draw in g 12 shows the physical configuration of the system by which this invention is 
performed. Such a system contains the database 1206 which memorizes the aggregate of a 
document. This database contains the index 1208 for memorizing those relations to a concept 
(for example, semantic or syntactic concept) and the aggregate of a document. Further, a 
system generates the index 1208 and contains the indexer 1210 for generating the concept 
which has the degree of fragmentation of a higher rank more, and the index 1208 including 
those relations to the aggregate of a document. The processor 1204 is used for receiving the 
query specified by the user via the user interface 1202. Next, the processor 1204 processes a 
query and performs a rank function. It ranks with the result of a query and a function is again 
displayed on a user via the user interface 1202. 

[0103]The person skilled in the art can understand that operation of this invention is not what is 
restricted to the example illustrated by drawing 12 . The person skilled in the art can actually 
acquire the same effect using other alternative hardware environment, without deviating from 
the range of this invention. For example, it performs by an element with separate **** and 
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various functions (for example, it ranks with query processing and a function is performed by 
another component), or is performed by the single element (for example, a single processor 
performs index attachment, query processing, and a rank function). 

[0104]ln short, this invention has held the original validity (precision and recall) of the group of 
the keyword about the inputted document, the dictionary including a definition, and the query. 
Index attachment (saving of an index area) covering two or more degrees of fragmentation and 
a new technique for supporting extension of a query using query processing (saving of 
processing time) are provided effectively. 

[0105]Since a query is simplified by the index attachment technique and query processing 
technique covering two or more degrees of fragmentation depended on this invention, the size 
of the index which shows the relation of a word becomes smaller, and the processing time of a 
query becomes short by them. Since the rank technique of this invention is based on a certain 
word from the beginning in a document, consistency is maintained at the result of a rank. 
[0106]lt is clear from the indication so far and instruction that a person skilled in the art can 
make other various change and corrections to this invention. Therefore, although this 
specification has described only some examples of this invention, various change can be 
considered to this invention, without deviating from the intention and range of this invention. 



[Translation done.] 



http://ww4ipdl.inpit.gojp/cg^ 3/9/2010 



JP,2000-137738,A [DESCRIPTION OF DRAWINGS] 
* NOTICES * 



Page 1 of 2 



1 This document has been translated by computer. So the translation may not reflect the 
original precisely. 

2.**** shows the word which can not be translated. 
3.ln the drawings, any words are not translated. 



[Brief Description of the Drawings] 

[Drawing 1] lt is a figure showing the problem of the word mismatch about information retrieval. 
[Drawing 2]lt is a figure showing the example of the index currently used conventionally with 
the information retrieval system of strict matching. 

[Drawing 3] In order to use it with the conventional information retrieval system, it is a figure 
showing the example of the index obtained by carrying out grouping of the word to a 
semantically similar concept and syntactically related extension. 
[Drawing 4]ln this invention, it is a figure showing an index structure required in order to 
perform query processing more efficiently. 

[Drawing 5] It is a figure showing the processing which merges the entry of the index of a 
coincidence word. 

[Drawing 6] lt is a figure showing the query extension processing in the conventional 
information retrieval system. 

[Drawing 7] lt is a figure showing the query extension processing using the query extension 
technique which has two or more degrees of fragmentation depended on this invention. 
[Drawing 8] It is a figure by this invention showing rank processing. 

[Drawing 9] lt is a two-dimensional graph showing the rank of the query which has two words. 
[Drawing 10 ]lt is a figure showing an order of continuous query processing. 
[D rawing 11] lt is a figure showing an order of continuous query processing in the case of being 
assigned to the importance of the level with a keyword. 

[Drawing 12] This invention is a figure showing the physical configuration of one feasible 

embodiment. 

[Description of Notations] 

1202 User interface 

1204 Processor 

1206 Database 

1208 Index 

http://ww4.ipdl.inpit.gojp/cgi-bin/tran_web_cgi_ejj 3/9/2010 



JP,2000-137738,A [DESCRIPTION OF DRAWINGS] Page 2 of 2 

1210 Indexer 

[Translation done.] 
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mmmcrMM] 

b. 

[ mem 2 ] muzm ltfc^t, 

d) WififcOWffcar^T, 
[11*113] IM^krfcwr. fulfil v^WftXr- 

[ If *H 4 ] If 3 idfct^T . W5Stt^Jll»{4 . ? 
x 'J -0<7- H fc m&tltlXmiZ'k£tl& «7- F tf, 

[ tt*ii 7 ] mim i (cfc v >t . luiaa^ mxt? 

[11*118] l»*£7fciHvr. WlEJ9r®Z>aBKi, 

[ it *im i o ] n&g 9 (c ±5 * vc , id ±eo mmm 



[|f*ill l ] »##lfc:*jivc, WiWt^x'J- 

[ if SEB 1 2 ] ft *il HCM^, I&JI W£ ? x >J 
- £ HSW" S MIBX t- -y 7° t^z . 
b ) ( i i ) nmt& , J: 0±{4^mriB»tt:W^^m 
tc*f LT . fltfcWKW iS^S V- F Sr#Jrf - & £ fc fc i 

b) ( i i i ) mrlBFJT7EO^tri^LTV^t\ 
r?_ H Uttortt Z b lz J: -5T. fTlB^x ij — Srjgtcfgf 

[ ft *3B 1 3 ] ft *II 1 2 lz& V^T , f^HS^CC ^ X ij 
a ) ( i v ) FJtJE^fp^iH^^. AKtWtRBit- SKT 

f 7Tt, 

a ) ( v ) vrxs. «t o±e 

So 

[ if *« 1 4 ] m*js i 3 v » tut m 

[3IKKB1 5 ] if*3ll (ctiV^T, Mi&wzmixf 
[ If *il 1 6 ] If *ii I2t:i>v^, m BHff 5e<7)»P 

[ tmm 1 8 ] if *ii i 7 tfc v vr , tnK^anw 1 , 
[if*ii i 9 ] if *ii 1 7 fctjvvc. ifrie#a^\ 

1 mtim? "7X?\<r> Un 7 b ^a-r; f: »t 
[if*H2 0 ] |f*IIl 7tCfcV^T. #SP§T. 3» 
itfc»*«I^^l'*RlftL.fclH/rcaWS*i.4 <T b * 



!( 3 ) 000-137738 (P2000-137738A) 



[ m&n 21] im xco/Jn $ v -f>f7?^ 

a ) 7t(omfrX.<n?x. V-cvy-FZ. Mfc-tZ . «fc 0 

[ It^il 2 2 ] If^JM 2 1 fctJV VC . Kmc ? X y 
a) ( i ) JJr5etf>»|S*»fc"$\ ?x'J-»7-F«» 

[If3<If2 3] M#PH2 2fc*JWC i9±tiWX«l* 

[ if 2 4 ] mas 2icm. =t d ±fi<7)Mia 

KteP, «£ 0 ±fi««^cW«E±T"fc Sit irW&Lt -th 

i it *ii 2 6 ] if sas 2 2 izti ^x . mmmz ? x y 

a ) ( i i ) Mm-th, £ K> ttiLWmiWb&CDZtl^'tl 

izmlt. ntm£9mr& v- f imwth - 1 izx 
->t, tfilB^xu-^sti^awtffi^sxx^r 

a) ( i i i ) |frlHFJT^S*^it^LTV^VV ?x 

y-rtov-h'o^tL^ixfc^tLT, flbtwtBBafri 

V- K &fflitt& i 1 1 J: -5 t , MIB? x y -SrHCIsf 

[ if is 2 7 ] n&g 2 6 (c&v^t. Msmmz ? x y 
a ) ( i v ) wje^wsMfc^. mx.mz$mtz>m 

a ) ( v ) flfcfc«(cBW4 Miav- K&tf , «fc "3 ±ffi 

So 

[ if 2 s ] if s n 2 7 (=:*$ ^x . mMrm^mm 



[ If 3KB 2 9 ] If *il 2 6 fci>wt\ Ml eBfr£<7)K¥ 

[ ffi&g 3 0 ] f f 3fclf 2 6 IZ& ^X . fuK« XmWtit 
(O^fl^ixif, fo&U^comMZMz-X , Mftt* 

Clt*3l3 1 ] lf*iI2 1 fcfev^T. 

c ) MjSttcoimm-s-^^-c, tftsR$^u^c*»9v^ 

So 

[ If &m 3 2] If *il 3 1 tcfc V , flulEM^ § tlfz 
-XMiiK 7COSB^**-rS^xy-wv-h'*Mv>T 

[ If *II 3 3 ] If 3 2 tCjjV^T. MMttOJlKJf 

tV7ft^^ «SWfcV7ft4*&, v.y^L 
tlf*lM3 4 ] |f*l!2 1 fctJV^T, XmiZ^tfiZ, 

umm3 5 ] it*ii2 1 fctjwr. MjE-rs. «t o 

tx\ mmmn^'yyyjK m&t&smz.ts^xmi' 

t it am 3 6 ] if 3 5 tti v » iuia#apg{i , 
i mm&? jxzm-tz t JM^ts^M, 
[ if 3KH 3 7 ] n *ii 3 5 fc 33 ^x , mz&mmz , 

1 mim? ^xftco nn 7 f Sr*-T£ t swak-f 

[ If *If 3 8 ] If %m 3 5 tCii , 

^< t t^xy-^i^tJDy-KtSiJDSTfe 
Cfft*a 3 9 ] XMcr>=HISItM yf ■/ , jWrttc 

X-foiX. MIB>-X-fA#\ 

b) Mtes:«^7 : '-^<-xfc®ffl§^i,^xy-^^ 

c ) 7tC0Mft&*^~th, ?x"J-«7-b'fe, ttJE-T 
x 'J-£»«Wfc||£5RU IfeflWfcfiSSiSn^MfB^x 
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[ ft^iM 4 0 ] If #JH 3 mrlETo-fe -y if 

a* . wattoiiiM t . mmz ix-tzsmz y y ? f+ttt- 4 ; 
vms%.4 i ] it 3t<ii4 o tz&^x » Miarn-fe 

[ 2 ] If *JI4 1 tz& V ">T . HBH£0>*# 
*V ^B^OJfC* 4 i i: £*Ht b ~t 4 iiy^f A . 

[if*ii4 3] aww3 9(=tj^T» «t o ±.{&co mm 

[19*314 4] lf*H4 3tCfcV^T. i DifiWmfia 

s»8«tofiia*\ anight? ^ fc zw&t-t 
[ifsf<ii4 5 ] H3R113 9 iza^x. fria-f yf^t 

t if *H4 6 ] if *H4 5 tcfc v it . mmmfcvmm 

[|f*II4 7] |f*Jl3 9(C*3^T. =fc "9±{5^|ffia 
[MM4 8] |f*JM4 7tJo^T. =t 9±££«9liflE 

[ it *ii4 9 ] msm 3 9izi5^x. mzyu* -vr 
ifimz. 

c > ( i ) mmcommzmtt. /xij-w-w^ 

W«fcS:HB#Mi.Sii:CioT. ift«Wfc:?x'J-£ 
»T4 d i: JMttStt*y^fA . 

0 ] lf*H4 9 Cc^^T. MfETn-fc -y^ 

c ) ( i i ) ± D±e«iuia«t^^a^tL 

c ) ( i i i ) msm^mm^mtiLx^^. ?x 
y-F*i<?)v-Ko^mt*tLT, flKtwtre*^* 

t;«r 4.1 b z'mt-r&imi'X-Tj*. 



[ If 3m 5 1 ] If *lf 5 0tfc^t, HUf eTcr-te y 

c ) < i v ) actwtwf 419 

iSV-K*, RBE-T4. ±"3±te<aSfcfcfcHS8U.. 

c ) ( v ) ffi-zmzmm- zmzv- fris. «t 

t»r4 ittiot, fria^x u -zmmmztm 
•t&zt zmLb-thwrnisx^^ 

u . friav- K^fflsswtc* 4 a> if a anca^v v * 

4£ t ^WSi:-r4^^XT-A„ 
[ IIWOS 5 3 ] If *3S 3 9 fcis V , c^Elft** 
"T4, luia^fie^^y-x-y^X^OV-K^'. StJiG^- 

fctliftiyXri,. 

[ n 5 4 ] if 5 o tea v ->t . mtmiz.<mm 

[f**3B5 5 3 »*JI3 9fc*svvt\ MJtEt-4. i 0 

±x\ ifria^xu-^s^T^\ aaw-iisiBtctJv^Ta 

ft^>n4 ; t ^WMi: t-4^^X7 i A 0 

[ m&9 5 6 ] it 5 5 tcfci^T. ifrie#sP§^' . 

1 LX \ ">4 i t Wl b -f&ffiMi/ 

[ if 5 7 ] if &m 5 5iz3s^x. mm&mm± , 
i ^coam? ?xftcD i -y h fc ^^nt -r 

41^^y^.'rA„ 

[ 5 8 3 n$£m 5 5 trfe V vc , #SPTC, S» 

n.^M^JS<50^^^RH^L3tJiI/fT^Sn4 <r b & 
MttSiiyXfA, 

[|f*i!5 9 ] 1M XO/Jn§V^»«-< yf7 ^X, 
SJf W-ft - i ft 4 £7) y - H t M J&t 4 , i 0 ± 

avfria^ y-r -y ^ x t iuf a«tt t cdisscdiss 

t, lalEvX^Aj^. 

a) m&zmn'f-f^-xizmmztL&r^j-zm 

b) 7t£0$B^**-f4, ^xu-cov-h'^, *fJEt" 

y-&. 1Ma^>7-''yy*x£flOTLT*fTU ^JtEt" 
4 . J; 0±ffi«^>«*fcBMrt-4 jW^tteR^-STn-fe -y 
^W-T 4 i 4: ^^®i:-r4^SyX7 i A . 

[|f*il6 o ] If5t<ii5 9t^^x. ltriarD-fe.y-9- 
b) ( i ) MfecoW££fflzt. ^xy— «7-bm 
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[ if im. 6 1 ] mim 6oc«^t, ± k> ±e^si* 

am*©**/?*!**, m«fc-atf£i: 

b ) ( i i ) fcfifrrs. i Oifir^tiriBit^w^am 

b) ( i i i ) MlBHffS^MSriS^LTV^W ?x 

T7_ ye £fHnrt-4, ; 1 1 i -5 t , fata? x y - ^iftaiw 

5 ] §££116 4 tcfc^T. Mfgrn-fe -y-F 

b ) ( i v ) Bfj&z»£»fc-$\ mxmzvm-t&m 
b > ( v ) (Rwcwtrsiwe?- f «t o ±ra 

mriav- F&ffffismtzh&friz o Mzm~J< ; t 
[ tt*ii 6 7 ] n&s 6 4 tzti v . mi miznmm 

Zmfz $£v V7- h*jiW*Prc* Sit iHtiit -r s 
[If5<lf6 8] m&g6 4 fciHVT, «rfE«t!tW«fcfe 

[ ft*II 6 9 ] M3<JI 5 9 iZH V . Ml BTO-fe y 

i m*im 7 o ] mmn 6 9izaux. mmmztttz 

[ lf*if 7 1 ] If 7 0(:fcW, BCCKtOWF 
[ lt&£ 7 2 ] mOS 5 9 C=tS V VT . J»fc* * 



cnwr«7 3 ] it^jas 9t:fcv^T. tots, «t o 

[ ffdSH 7 4 ] f f 3<if 7 3 tcfc t vr . ffiE#gtfttt , 

[ w mm 7 5 ] n ;&js 7 3 1; fe i vc . ffHE#SB« . 

1 ocoteSI^ 9X|*i^ Un 7 b S-^i i: Jfflit 
[ If 3»B 7 6 ] ff 7 3 fcfe vvc\ «-SPTC, *S 

ClftW^)»itlKriiBH] 
[0001] 

t73iV-coftmizm-tZ>, i , 9PL<«i. ^x'J-w 

3^)~<7>mmzm~th« 

[0002] 

^»c-irSn SIS C7-H) Cio*i^-fyf7?x 
#(t$^Siti>*S„ jl— rji, JB^riB5rffl»3&»6 

[0003] AHiitteiitctjttiv— k ■ sx-7-y^ra 

») iStf' r deal e r (03&g) j tHBltfftfeih. 
fc, A>f;t-Wb ■ V-^T'y 7°-glS (HTML) 
C0^»^fe V ^S "7- H 

T -y 71H (XML) WHWt?-? T v 7°miS 
(SGML) t^ia^. HTML^h^WlStfflV^tl 

h o f ^\ Tautomobi le ( @Kl¥) j t 
Tde a 1 e r (JKJBE) J tV^7-h'S:?X'J-t 
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[0004]^, *«!H«Mmi. SWott** 1 , 

i t «> to * . zzvte. msai-cK* s *ufcK 

h*h(4. mz ?x'j -^s»fosift ^ i^Bj-r -g. 
t wtiirf , ?x 'j -raB*w4H*»*aars*v^ 

[0 0 0 5] ?xy-cof£5It£, zniottSSMifflk 

wowkl^v-f wais^ffiopgjit- 

ft£*-f 4 V- F ) ai^MbtWt=HBl-*-4 V- F ( Mi 
SHi, flKWSSBy— *^xu— rttoy— K 

Ktv -y^-r& ^riMbWBf* 4 . ?xg -^ot^si 

j&*tefflS*U> k . r car dealer ( StflM^ffi 
R«^«*offllS*^tf ± ^ tcte3i£;fx4 . 

[0 0 06] fil. [ ( r c a r (SOT) J OR 
r automobi le ( g Rlllf ) j OR r a u t o 
(^) j OR r s e d a n (-tr^> ) j ) OR 

fx 2. ( TFord (7t-H|) j ORTBuic 

k (h'i-f j ) ] AND 

f?3. ( rde a 1 e r J OR rshow 

room ( Is h — A ) j ORTSalesOf f i 

c e (jflagjf) j ) . 

[0 0 0 7] JJE«(c*a*t4^xi;-^3Rfc:tt % 2 
■rmfJT&hh. ff 1 fcff3^xU-«^3Wi, ffl 
HCDSUfetCi^T Tcarj t r d e a 1 e r j tCRBS 

^4BSD7-F£^n^4k<7)T'&4. w*>, »*wt 

S(K^&T7— F£jg^Si<?rC&S. Tautomo 
bi 1 ej , r autoj , 25.1* r s e d a n j (4 „ 
Tcarj i;V^7-Ffc»&^£Slft£'irt& , 7-b" 
"C'J>4 o I^IflW^, r Showroomj t r Sal es 
Off icejlt rdealerj 

SiHc**^- 4 V- FT* 6 . fflJ^ ? -f y°co?x. >J 
-OfifiSSti.. *T2£Srfk<7rt'& i 9. fit 
:£HftjHBBflifc: i 4 C3T» 4 . V—iVYVA 

Miff , @fflr^(i. Ford, Buick, NBA, 
S.VNFL (National Football teague) fcV>-o7t^ 

- f a* , is] ^-xm\z^m^zwxhwm:%m^h rtc 



{£, rFordj tftC^nv-PtLT, r<le 

a 1 e r ( Igj&Jj ) j , r body shop ( Ifffcl 
*§j „ rMustang 7t-MI« 
MiO&m) J , Escort (XX:?— h : 7*— F 

[0008] ?xy— 0ttt3K*£flK-*;fctf>fc:, ffllS^O 
± ~> T Braftft ^ttJt T7- F-JO-O-x y ? -X k , 

^mmmcox ? ^m-zmmmtmwizmn^u^iftiif 

& £>3rv fflaw**(c i^t7- F t=BB*ft It ^fut 

©COV^li, 1 99 7^8^. ^U^COTf-^'Cff 
jfoiut, the 23rd International Conference on Very 
Large Data Basest ffiifeO^— i/538-547 , W. LiflficO 

r Facilitating Multimedia Database Exploration thr 
ough Visual Interfaces and Perpetual Query Reformu 
lationsj Zm&^tltz^* ±fz^ BUI5S**-y FV- 
^tCOV^Tti, 1 99 0^, International Journal of 

Lexicography 3(4), ^—^'245-264^*3 ft 4 , G. A. M 
iller<7) TNouns in WordNet: A Lexical Inheritance S 
ystemj £#!B<7)<r i; „ tfc MjIV— FOPgS? 7.X 
^(C-OV^TJi, 1983^ x a— 3—^, McGraw-Hil 
1, ^.— ^118-155(7), G. Saltonfi!lt;j;4 r The SMART a 
nd SIRE Experimental Retrieval Systemsj &^$s.C0Z. 

t . m^m^mmmco ± a , 2 «w»-c 

3&qeK3#iTV^4 0 iti^tfOSffite-^v^Tii, 199 2 
^ N f>7^ ^ tfctt^the Fifteenth annual Interna 
tional ACM SIGIR Conference^^-fl^cT). G. Grefenst 
ettetCil) r Use of syntactic context to produce te 
rm association lists for text retrieval j „ 19 9 
6^ X^fX^f^ U ^btiStt-Sthe 19th Annual 
International ACM SIGIR Conference ^^fr(J|<7). J. X 
ufiilt^ctS r Query Expantion Using Local and Global 
Document Analysisj s 1 9 9 7^ T^U#^<[IK< 
>y;W^7ffl7 -f ^x/l^^ T^Z&fh, the20th An 
nual International ACM SIGIR Conference (O^Wi^ 
CO^ C. Jac quern intCI J: ^ r Guessing Morphology from T 
erms and Corporaj Sr#RgcO^ t 0 ^ 0 L?tS5£{±. 5% 

^sjk^^w. (Wit*, ^t^v-k 

[0 0 0 9 ] y-K ■ sxv^^coras^ML Tl±. W 
ffiM^ ( I R) <7DiHff£*5Vvc. ^O^Sf^S^iT 
#TV^S 0 ^tlCo^Ttt, 1 983^, McGraw-Hill 
BookCompany^T^ 7 ). G. Saltonfttlt J: %> introduction 

to Modern Information Retrieval j „ 1 9 8 9^^ Ad 
di son-Wesley Publishing Company, Inc^f^O. G.Salt 
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ontc4-§) r Automatic Text Processing: The Transform 
ation, Analysis, and Retrieval of Information by C 
omputer j . Wl 9 9 7^ 7*-* U /r&fflRX 'J7t 
;t/^7fflfy77yy^3, Morgan KaufmanncT). K. Sp 
arck Jonesftflt4 & r Readings in Information Retrie 
val j £#8g<0£i:. 

[0010] L*>U Zcr>ffl9t<QftifW^ M^bUM 

^xy-^3R*«*W(=«rft*ffi (19 9 3 
^ y— 7^ KfflGaithersburgTlr^^the 3rd Te 
xt Retrieval Conf erenceW^Fffijft^D , C. Buckleyffctc; 
4 •§> r Automatic Query Expansion Using SMART j # 
m ) ^ v 7 X ff it^ f) -XA £ ^ L fcBBfe&* 

T2^-)TUI,. 1(11^^4. <fc & (00 
iff, 7X7') K#Mffl«ffl#feErC& 

tz 1 ?— K2r#< ^Tf-& cr>T\ yf -y ? XO+M X^'ffi 

x y -^3fiSn v- \ i l l z£-?XWM'£tLZ><7)X\ 7 x U - 

t o o 1 1 ] ^7 , frbmmztLizjmtim*vi*)Wio 

-Vt>m#>Tg>mx\ — Jttt3&*3r<» ^tii^otv^ 
4 («*.(*. ^>f7x?-) stft. ifL^OPflfflii. £ 

4, i^Xt-OV>T{i, 1 99 Digital libraries 

(Dt ' 95) CD^f-MMX\ B.CroftffiCO rp rov iding Governm 
ent Information on the Internet: Experienceswith T 

HOMASj MHHSft/t^. L*»U ?xij-S3Kffln 
*tff, i?x»j-cD«§H*fWt«<5:§, ItSWfc, 
7x7'±<7)g[^Hf -- rxy i/ycot 5 1 ^ if (4. ?xy 

[0012]:;?. ?xyH£^#»t&tt4I!f# 
cmZLZmtth . ?xy-J£5S<4. IR^Itfel^ 

4- 4 TcofflZZt 4 . ? x y -j&ifitSR § «^rto ?x'J- 
^jjj^&Ma^P.'ll^. ? x y -<^t»ftt^«Hi iff 

a 0>£fflt^;h.4 >f y-r'-y ? *<W>f X£/p.£ < i: 

m/rnxm*^ vi-> wzm-ix ^xxmzyyfmt? 
[0013] smarts. £<9at>fitc3m<7)mmm 

gR^X-rAcoioT**. :W:ILtli. 197 1 



^. T^U^^H— a— fflEnglewood Clif 
f s c7)Prent i ce-Ha 1 1 ^ £> ^fr $ ftfcGer ard Sa 1 ton^S^O 
The SMART Retrieval System -Experiments in Automat 
ic Document Processing. ^1 2:H<75. R- T. Dattola 
[Z5i%> ^Experiments with a fast algorithm for auto 
matic classification j . &Xf AJ£~5cM.&) . G. Salton 
fltlk-l4& r The SMART and SIRE Experimental Retrieva 
1 Systems j &#BS<7)<ri: 0 SMARTTd 
ffliS<7)<? hJl-Tm$tL&« l;l«^iifii«fii 

4. Nffl«S^^ffllI$rfl-t-4MjI(755:#^^-{4. M 
xNcr>WHX"$kZtlZ> . ^liJ-tifcffiiW 
t LT*$ixS» S»tf5BBRI±. ?xy- ■ h^t 

"^<o J:<ffl^fL3tv'^.T-Ati4. INQUER 

Ytffoh* ijltO^-Ctt, 1 9 9 5*. Information 
Processing and ManagementC03 : 327-332T\ J. Call an 
ffitCtS r Trec and tipster experiments with inquer 
yj *#^ii; 0 

[0014] SftWEft^ yfvWdsi) {4. SI 

h„ ZtlltZ-^^Xii. 1 9 9 0^. Journal of the Ame 
rica Society of Information Science. 41:391-407 
(50. R. Harshmanfti!t"4 •§> r Indexing by latent semant 
ic analysisj . Rtf 1 9 9 5^. the 1995 ACM Confer 
ence on Supercomputing^O^fit^T". M. W. BerryfliltC 
J; § r Computational Method for Intelligent Informa 
tion Accessj ^#3S$tt/SV^ LSI {4, 7-F«ffi 

* &m t & - ^ 1 4 -5 r i'hgPll: $ sfx I. jfi® 4 . 
»^ffiffl^^5 4„ 5Sffl§ix^#Mffl^flf (svd) 

3t»fcas , 7-H*efflcoflieets t flrf6. into 

V^T(4. 1 989^. T^U^^H^y-5>-Kffl^ 
yl-f-^rTOJohns-HopkinsCO. G. Golubffi(c44 r Matri 
x Computationsj IS2 fK^#^$ilfcV^ i;t14. 

i^«fp«Ti4. zcommmmcoT-ru-j-te. m 
*commzm-3< i>coi. vm^mtztixuz,, 
[0015] iiwbSifafc^iy-iaawa, v-h ■ s 

jto :iit-3^T(4. 1994^. 7-f;i7yh'ftlnH 

^"7" y >T' -; ?f i?ttfcthe 17th Annual International AC 
M SIGIR ConferencecO^-fSftT-. E. Voorheestci £ r Q 
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uery Expansion Using Lexical -Semantic Relationsj 

T ? x y - £fi£5B U BW ZTMPlXV- F #v ■/ 

SW*ffli^4 *£W-Ttt» BMJFtiK** 1 * Sit 

iK^SaWtteffifcSeKS^-CV^S. 19 94 
^F, the 3rd International Conferenceon Information 

and Knowledge ManagementtD^f-fit^SO „ 0. Kwonftlllcj; 
•§> r Query Expansion Using Domain Adapted, Weighted 

Thesaurus in an Extended Boolean Model j „ 19 9 

fffoflfzthe 16th Annual International ACM SIGIR Co 
nf erenceCO^ffi^O „ E. VoorheestCj; h r Concept Bas 
ed Query Expansionj „ |3I Z F'^^<73, E. Voorheesfcl J; 
h r Query Expansion Using Lexical-Semantic Relatio 
nsj , mfWPWkP). M. W. BerryfffitiS TComputa 
tional Methods for Intelligent Information Acces 

sj zmmztif^K mmcofe^ giMbSfut^xy 

frbtlX^Z. iiUcowai. m^mMco, C. Buckl 
eyfHltCtS r Automatic Query Expansion Using SMAR 
Tj Z^mZtlfzVK 

[0016] ?xy-coemt4. w^tMffi-rs^- 
•ffcU z\K^<D9?x9*m^X9^)-Zm&t%>. 

itW^ffl^t^f'? (stemmer) Sr30gXI4£rt£U 

W*»*SQELfc. ittt-OUTa. 1 9 94^ the Fo 
urth Annual SymposiumC^^/S^X W. B. Croftflfi£0 
r Corpus-Specific stemming Using Word Form Co-occu 
rrence j £#flg§ JtfcM. ? X y -<7)fflfg£ 1 fflOEUfc 
S^CgBii fflf § (CffiSW 4 ±1 B#Kffi(± . <gl oba 
D*Wtii«f*i4. 7x'J-iIT'li, WirM-H^ 
>y ? *> ^ £>fflB 1 9 x U - (c ftfm S *t „ ftsR<Z>%¥ * EX 
« 1 990^6^, Journal of the American So 
ciety for Information SciencecTMl (4) :288~297. G. S 
altonffilcT) r Improving retrieval performance by rele 
vance feedbackj Sr#H5cO«I t . <Itl{4.. IjBlr(local) 

coffiKSSffl-T* <! t \z X ~>X , #«^S5W7 4 - HA 
••/ 9 «t "5 JftSWCJ: 91Kfr«SW«fe& I »6*U i fc * 

il) TQuery Expansion Using Local and Global Docum 
ent Analysisj O^fflCDZ t « 

[0 0 17] LfrL. mmVZiXolz. ^tZX'OffiZt 



a^rafflSr^tAc o , ?x'j -aaat ?x y 

-T* i i: * B*W-fc«"Cli*3&»-3 3t. 
[00 18] 

f • sx7 7f«t, i&£Wt£t&7xyH?ui 

4. <fc9f¥L<i4. ^xy-fttfg^ft^y-Ffc* 

nfewtawsiu mxtmzmmcoh&v-fzzm^x. * 

[0019] ifc, ^xu— tf5J£5Bt»^5fcftfc:, 
«(= & 4 7 - F <7i <i y f >y 9 x Wim* n& 
o^WlSCtl) . lolli^ yfvtrx ■ 7—7 

)v<?y*r-4 xcowmx-fo o . 2 -?g (4^x y -iio^-- 

[0020] 

It7i-X**tr. -f y^'y^X#(t7x-XT-i4. X 
i^WtSML^V- F* 1 1 ^M±t LX9>V~7°it% 

fc, IS*W^^co 1 y^' -y • -f-f Xs&OhS 

av«s:wa£5iKv>y ty^fi, *sswt7co^xy 
izx^\ ^x 7 >?mm<n£m^ hti. mmm^9^ 

[0021] 

nmmb^zarxmm^zmm^tit . 

NECOPERI CO^y'i^ F^|fr|T— ^ ?y <-— X® 
lyXf^ (OODBMS) t^M tT^§ttl=^\ 

[00 2 2 ] *^HJ(i^ aitO$B^StfO»t:^#A-r4 
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Hi. 7-b*«XfS^ (stemming) COf^T. flJffl"BJ 

IBSsStCO^Tfi. 1 996^. fi-'j7t 
T"<7)the 19th Annual International ACM SIGIR Confere 
nce<5 r )i'ffi}^£0 . J. XuffttO rguery ExpansionUsing Loc 
al and Global Document Analysis j » IkXf 1 9 9 7 

T"69the 20th Annual International ACM SIGIR Confer 
enceiJOT'ffl^fTX C. Jacquemin<50 r Guessing Morpholog 
y from Terms and Corpora j £r#39«£ b „ Sgfcl^SKH 

corrn— Hi. iK^xyh'J (^tvh Sr. J; 

fltXW teBffitt* v v f-y^t=*^v »T?x'J -MHO 

X*/hH:<~tZ>c\btfX'*, #*o. iDIv^xij-Jd 
[00 23] «fcjfc, ^[«0*H^JS<?)«IBfc , 

x^^&m^j y^vfxttmzmmLxmmzti&cr, 
Mz-jv^mm-th. mz, rmco-xmnm&izmL 
x . im.<omftm&&t& 4 -y rxmjzn o 

[0024] I Rz/X^Ali, -%mVXhfrt>ffi 

^coy—FZ&mzmitt&tzWz^ A yf vt'XZU 
ffl*tttfi-$-<S>. r X»j fc^-3ffl»i. -r* 

[00 2 5] 02ti. >f yf'y^^M^U^S. 
H2C0 ( b ) l/Z^-fr—XMi. 02<7) ( a ) tStf 
-r/l^lRKUfc-f y*r-y<?XX"foh. H2T1i. f&ejf 
tr^Elz-t ttzib. Zfihcr>4 y-f' -y 9 Xjfi'f— 7)V<7) 

BX^tiX^h, L#>U mmcom^xii. muin 

ECWPERCIO OODBMS«±fiI<?)?5^ 

m^btih. 1 1<?)? x-V— VMS: b&k. s.—V&Mk 
MZ^ "7— K Tear (fUfflS:) j r d e a 1 e r 
(JB0IHS) J ^^T7X'J-J««J:, IRyX 
rAtt H2co (b) fio-r— r/U^JGrSfr*^ X* 



uxh^KDtfi-r, ?xij-^i±. 2-5 

<Dftfr^n^tifzJcmvxhco&ffi&'ftt%:&. zcoi 

Rt=str*rro-f-tt, sgA>^Hc »7 7 f>/ 

<^£SJSt"&i>CD"C& l 9. Automobile 
dealer ( ii&^^DlEJgJjg) j „ r c a r s h o 
wr o om (Sffl^Wv- 3 — )V— A) j „ Xi± Taut 
omobi le showroom ( @SJj^<7)i^ 3 — 

m?t%hz\>tfX'%ts:VK ^xy-ttawa. ?x>;-£ 

r c a r j jfr^O r deal erj tBit^'o . 

( r c a r j X{± r automobi lej ) 

( r dealerj Xii r showroomj ) b^O 

iZ. @2<7) (b) £7)-f y^r-y^X ■ -r— TVHCOWC^ 
2 0*50^ -y^T-y XffAXfo *) lZ, it^^x. ij — |*)<7)y — 

fr<7»V >y?T>v 7°ifi&mz%; ^ 4 tz » * y 5 --f >"SI« 

z\tLh<7)®.mfrt>, *^Bj(i, sawan^&tiBrrftiR 

[00 26] ifefcifi^fci 3 fc, i-iTtf5B*k##<?5 
lift t O 5 x v -y ^ * jett S 7ti6 nz . S*<7)«Mt" 4 V 

- k . RVffixwmm * v- k ^fflv ^ ?x u - 

So 

[00 27 ] H3{i. I RyXfAtfcV^, ^ 

x >J -cr>$m$:mmi l z-t&<mzmnif i <J&mb tzh t-9 

TV^4. Miff, a«-T4fflB*0ffl r car (^ffl 

r a uto(^)j, Tautomobile 
(i!W0 J . 2W/ r s e dan (-tr^'>) j (i, lO 
wtWxyf^f^ semltU«§iltV^. 
SMMMSWK*^ < Xtttt&HMK tliS-^t, IR 

tfc vr&ffijcmmmte . ^m^mk^h^zi. ~>xm 

MZtl&« mz, "7- HcOft©fffg{i, 2-OC7) T 7-r-'5- 

«X«fcWilft(t*«fcHSfflS*i.&. @3 (b)ii ; 
PMMZmttzJ y"r -y 7 X%m^LX\->& , H3^a 
Wyf7?xtJt(:, m2<7M&eoi R^y^-y^x 

I RyXfAfcfc^tSS^iiS, »*eutc{i. X-if 
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[0028] 7^v-cr>$mzm^te?^v-n>imiz 
ti, ±T$coijm&mmzti&t>K i^rro-ftit 

§ 5 ilflnco -f y -r 'y ? xHBftWSffl § ti & . *SMJi0>T 

7n-f«i*flf)Mii 02^1/0 3 <tm y^-y ?x 

£ , ?x >J -^«E^Wt:J£3i^tLS J: -5 £><0 

S^SV-K^yxhrtt^i^Sifcfci^T, ?x 
y -coy- Fco y x h flaw^iest 4 wilft < , 
?x'J- £07— K£, -ecoPBi*-r£ , i Diftl^UO 

»"*«fcfetflbw«n?8ffi mui. #mmi%) w-vt 

h. zcozui. m.tocoJy'r-vfxffimzX&g.m* 
— A— A,» y FcoilimS- kfcJb-f, L^L, :x— (f<o?x 
i/— #J: 9SWWfc*BH3*t4<Z>t\ LT(iSS& 

[0 0 29] mftLfz£ 5 tzm&Ztiti 7 x- ] J-£%m 
-t&titblz^ m4t,Z7jk-tX o lz, 4 yf ypx ■ x— ~f 
MPSZXSIXh. 04 <0 (a) CSt-Of?? 

x ■ x-77Mi, #v-H (HS^ft-CSr^) £, i 
•3 _fc.fi U- ^coERWffc&co v — h* tcB § i. 4 Z b t,z _t 
ot. 02<O (a) *>6SiaiS*lS. 04cO (b) iz^k 
•tJyf-yfx ■ -f-7'7Mi, @2« (b) iZ^flfz 

^cov— Hkffi^t)^ ^i^ico;£#yxb<^xy 
bysrv-^'-r-s.-k^i^TW^nSo a-jt, r c 

a r j , r au toj. ^automobi lej ,S 
V' r s e d a n j fcttifc-f-Sfixy b U fi, 04(7) 
( b ) tlif-<?)i> MJSemlt LT^^T^ 
So IBpBIt;:. 02 (b ) <0, rdealerj, r s h 
owroomj , $tt/ r Sa 1 esOf f icej t>pf 

[0030] mXfftfcWm-t&V-ViZftt&'l >7-v 

-Vlzn-t&J y^ytxi. 9j&*S:D*&v\, -7x:/± 
colKcOV-Hti, H^TO*«rtr» 0 » SWfc(4fl.036> 

V— F04 2%/iiWWo r d N e t tl^o^:, W 

0 r dNe tii60,000lilt<O'7— Y^-th^y? 4 > 
ffimX'$>&<, Z\tlt,Z~3\^XlZ, 1 9 90%, Internatio 
nal Journal of Lexicography 3(4), A v — ^245-264 
CO, G. A. Miller(Cj;4 r Nouns in WordNet: A Lexica 

1 Inheritance System j i^fflgfif^K 89 CO 5 8% 

4. f£*coi R^XxAticfcVvtti, mXtfifcVG&ttW 
fi, a*, ^BBBffifcioTiiaiS^-CV^. 
ftT'-c07-Fc0ftjtB|lfiti, 1SJ1»HS-Cft4fc«), n 



il iSiTJ^-Xtli, (nX ( n-1 ) ) /2fc* 
* . E^^ia4t«k -f y-r ? xff itco*-; \'-^ -y K 
co£_i6t;_, SfflKLh^V— H<?D^jaBlflt*-f yrvtx 

[00 3 1 ] S»:lo^-5^7-N' (mWW^S«<7j 
&&kco) £SJ:L, ft&cO^TcOV-H (SW^PI) t 
Pt^-&„ BWItZfo&v-FtmWlz^v-Fb^ 

s*f3' y ft* . 

[00 32] ■ P-PI : MxJi' (To y o t a ( h 3 
9) . Avalon ( r- 3 ^^cO^|fj ) ) , ( A c u r 
a , Legend (T^a-yM^ 

ffl) ) , (Nissan (Bl) , Maxima (Bt 
^co^fu) ) „ 

[00 33] ■ S-PS. X(iP-Sl:Wi(J'(Bu 
i c k (7^-- r-**<?)*if) .car (^ffl*) ) . B 
uick, dealer ( IK^JS )), (car,Fo 
rd (7t- Ktt) ) . (Ford, auto 

(Ifl) ) , (Ford, deal er) „ 

[0 0 34] ■ S-SI : Mi. if ( c a r, garag 
e (fils—i?) ), (auto, garage). 

[00 3 5 ] MM. H3C7) ( b ) tzjprt^ X OfflV^B^ 
JKC^r S - PMoxy b y tsa»-f4 ; (i 
WC* So t *» t , ffi<7)^:T cox V h y ii , MJEf 
4 , «t O^^l^AL/PiOSWifc^ta^T'^ 6S7-Pt 
*f 4. fte-O-f'-y^Xco-f-YX^M 

y^"/9x<7y?4X<?m&\i, aTcoid^tS. s 
-PKwi, X) co#xy r-yfcMU w^illiftj: 
SemitMJE-tS idtc, H3c7) (b ) tc^Six^^: 
TtfO (wi, X) SOXy h U 5r, 04 CO ( c ) <7) ( s e 
mi, X) ZZX\ *t&t&*X<W)Ah 

i>-?— : JZti&, R^co^nl^P-sMcoxyh y 

SafflSilS. 04 CO ( c ) fc^-iatC, Jzyb'J ( F 
ord, car)h (Ford, auto) iJ, (Fo 
r d. Semi ) £gJ&£fl-S> „ xyh'J ( F 

ord, dealer) b (Ford, showroo 
m) (i, (Ford, Sem2) 

!tV-y • ^^-X'AtoV^T, 05CO ( a ) (b)^ 

[00 36] s - sMcoxy b y i±, IXFco 2-oco^g 

[00 37 ] ■ V— : 0 5 CO ( a ) ( b ) (C^rf 

m^/^n\v>?A r<D-7-^ Q mm, x 

y h 'J (car, dealer) , (automobi 
le, dealer) , ~B(Zf (auto, deale 
r) (Semi, dealer) tg^^fxS. i 

i T'ffiffl s ix 4 r 3" y x a t $ , s - p m&if p - s m 
xmm § n. s t> co t n tT-s> 4 . 
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C0 0 38] ■ m&~?-iS: M5CT, ( c ) CStio 
&\ M^f-fTW-y. Mi-ii\ xyh'J (ca 
r, dealer), (automobile, sho 
w r o o m ) „ JkXf (auto, SalesOffic 
e)fi. (Semi. S e m 2 ) tBSt^ixS. ZCD? 

[0039] 1 . S - SSLcO^y h U (Wi, X ) t 

3 CO ( b ) CO (w is X) coxyh U^T^s 04« 
( c ) fcjjstf- i 3 =3: ( s e mi , X ) icflBft-T 5 . 
[0 04 0] 2. (Semi, Wj ) ^-f/^xy b 
U fctf LX , w d £WMIfc& S e m j t *fJfrT£ i 3 
;oU^TO(Semi, Wj ) J, (Sem;, 

[0041] ±ie^T->yT2(±. ±iex^^n<7)mf(- 
nfrrs i fc fc-r* -sit tax-r^s t&s . mz. 

[0042] jn&coxy b y ^v-^'§fLS t , ^#t(c 

N I ON) H{;j;ot"7-yS^ t 

[0043] wRcouMj-ies^rt-s-f yf7 ?xft «« 

Hgt'(±. M2CO ( a ) , H3c7) (a) „ &t/IIl4<7) 
(c) t^-f-f— rt^^*t"S^9x-?'$>-i> 0 

frR[2(a» =D 
£flnMX(2(a)] = (1+v+w) 
[0046] #fitci3V>T. SS<50MSlJ<50/^is6t; 1 -PCD 

0>fc, WTw««sK>f y^jft^BfcSfii^T. (1 
?T$M2(b)) =W + V 

^ft^>f X[2(b)] = ( 1 + d) • 
[0 048] Zcr>r-—7)\s<7)%-ffl$, W?\ JMVX 

ynx*wcommTtts:z>dmcotf4>?b. v-vz 

co&coZifetl'ocotf'f >?%<&mt-$-&, 
[0049] mz. £&&fa?X.V-ffigi£3tm-?&C0 

fr»[3(a>] =W/f 
£flr»M'X(3(a)) =W+W/f 
[ 0 0 5 1 ] ( 5 ) li, SWfcfcftV-H^IEfUS 

Effi^»f tS^i^El?nSWt, iconic 
^§tL^ 0 ( 6 ) 14, WM^jK-f >^#\ 

fr»[3(b)] =V(V-l)/ 



[0044] act. ««7- h'ta-^< y?>y?x 
comiz, mmfct\ l zm~3<<iy^rv7x • t— y/i^s: 

g-^S. ir^tJtipfc:. H4^-r^-7";^\ S&sp 

setc ^com&co^Mcom^mzisnh, swtcfc&v 

— KS: (7— K ■ XtSy^ffl^t, Xb77 ■ 7— 

xyhUSc (MP*>. f3R) t^-y;U<7)^#:^-9- 
4X (ap*>. rK-f y^^ife) t^^'^T. -fyfvn 

c\ixhco?^y*—?tfH-Z-ihKht. H2« (a) fcss 
S^f-^W^ Xli, ( 2 ) fiSii 

4. 

[0045] 

... ( 1 ) 
D ... (2) . 

+ v + w ) OJf t T ^ § <! t t^Kf ^ $ T£> I) . 
M2co (b) t^-r-f— X(4, 

(505^ ( 4 ) -C'^^iX^ . 
[0047] 

... (3) 

(W+V) ... (4 ) „ 

mt^-&« tot, f io«t;^-rft?n 

tz V- FOit^T^i: IS^i; & „ 0 3 CO ( a ) 

CStf- ^WW-f X{4. IilTc7)5t ( 6 ) 5 

[0050] 

... (5) 
... (6) . 

ifiwm^m^m-tco^Ziimx'h & z t ^ lx \ * 

03CO (b) X^tLhf—XjVcr^A X\i, MM 

con&, lilTOit: ( 8 ) "cmzti*. 
[0052] 
2+VW + W (W- 1 ) /2 . . . ( 



7) 

£flc^X[3(b)) = ( l+2 + q) 
) /2) . . . (8) . 

[ 0 0 5 3 ] 3$ ( 7 ) T"(4. m 1 H* 5 P - PMOV- H 
wftSBWRtdWS L . SS 2 JH** s - p IXIi P - s St 

ti. ftjBBBGR*afJBte<0, UXN ftox>- b U CD 

fr»[4(a» =D 
^ftt'fX(4(a)] = ( 1 + v + 

[ 0 0 5 6 ] fiP*>, *r4X\±mi<?) ( a ) tSj^f — X 
/PfcHtT&S. — 2r\ 04« (b) t^rT'T-X/UO 
fr»[4(b>] =W/f 
-£flr*MX[4(b)) = ( l + df ) 

[0 0 5 8] S?«(C2&&T7— b'OX^b UO$:(4. V— 

--fXiiu 02O (b) t^-f-f— X;bfc Initio &t>£9 
i DftV^^/boSBhgtcfc^TJi. 04 O 

fr»[3(b» =V (V- 1 ) /2 + 
) ...(13) 

X[3(b)] = ( 1 + 2 + q 
• V ■ (W/f ) + ( 1 + 2 + q f ) 

[oo6oia*eut. s-ssl s-psl xtiP- 
03O (b) CStf t i^TlIWt/h? ^ 

[006 1 ] JftftWfc, *$WftcJ:*tff , 030(b) 

x 'J HSSBSifefct , H 2 &t/H 3 t^-T-r- X/b£ 

flWfcHWN&^Mi, 04O (a) b (b) fcSrt^r 
-7'/K?)ie'ltWcOtijD^fc'ft±# < t£h if. EI4 O 
( c ) \z5fit s r—TtWi*M X#/h3 < &&OT\ Buffi 
nx b OtSJD^iM^-W^fflA-^^^^ixS . »>07£ 

4. ftl«^ti,, SlnaiHtttl;** W»?xij 

[0062] MiiCM yfv ?Xf+Jt&}£{4. r 7- HA* 
m-OSiftO^ £ ^ h Z b S {R5e L T Itlr § *iT H 
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• (V (V-D/2+VW+W (W-l 

dJ:9*:§<$ri5, £<Z)ffi3KfcL f ■ d£l±3r63r^£ 

%><Dl l Z%;%>ZbZ^tZbifiX*%& < , Z.tlt>W5* — 
«j§Jn«E«ftt-A-^ ■/ H z b 

MA CO (a) t / z^f--7Mzm~t&s\mmTco 
m^oxhh. 

[0055] 

... (9) 
w) D ... (10). 
•+MXJ4. \X¥comK)XfoZ>. 
[0057] 

... (11) 
■ W/f ... (12). 

(a) fc (b) t^-Tf— X;W4. -f-n-€ f mi2(7) 
( a ) ( b ) fcjiH-r— X/kC&S i t taK-t^S 
ftat. 040(c) CS^r— 7";i^)E«W 

o_awD(i. oto^ (13). ( 1 4 ) o=fc a tcits: 

[0059] 

V- (W/f) + (W(W-l)/2f ! 

) ■ V (V-l ) /2+ ( 1 + 2 + qf ) 
(W (W- l)/2f 2 ) ...(14) 

rbankj tV^7-Kil Allid 
ft) . 4fc(4iriSfcLT»RS*i.-6. 1Si^o«ift^*t- 
1.7- HtOV»T#JS-r -1.^46. b U X b£0 

"7— b* (03T'^$^'rv^) -hK 04O (a) 
«R««4*^fcll-r4fcWt-r*. MUZ, r b a n 
kj(4. Semi 0 i: Sem2 0tII#lt^ilSt« 

^-tS-TS 1 -2«v- b"£*tr*§i=r. ^-O^^r-SllS^S 

[00 63] ±S»WC*SV^(i. >f yf'^XgS 
(4. NECC0OODBMSCD±.&mi l zmmZflX& , 9 , 
Sift en T7 - b' U X b F*3 O 7 - K #\ ^-O- ^ t i o T «E 

•m^tixuh-f. m >?izmtMmMco?xhhtt 
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urn Mmmzm* ^MM^mmxmm l , znmm 1 
&mzti&mM< l zmtx7y7-mtt&, mm. r b 
ankjii imx^i>^mmmt tx. x^^<nm 

[0064] frc^^tutJ^ko, Eiffcto^/i—T-fc 
£#(t^Aix4 i b sftJ-CS £ . 04 TJi , J: 4 

^xU-coSt^lW^saT^So ISA, I s_ 
part_ofi?co, ffetfOM^oBifto^fn^^jt-ri.^ 

bi>T%z>. 04 (a) [z^mnvf—y'fvz , m 
^^mmcn^)v—y'it (mm. loiai sacil 

lOlil S_PART_OFtHU) tHU* 

[0065] "7— F • 7f»raSt»l>t 

H £jj£3*t isfcRjito 4 . teSWt , tcco? x y 
-cov-F test*- & WStttk: i -5 T £ 9 y ? Wt-f 

h. mu?x7&mizn?&, mz. Ksusnfc^x 

y— COJSS&J, 3-?cD?X?. BP%. ?x 

Q = ( Sj A. . . A s m ) A ( Pj 
S(15) Tli, s i (ig?«;:&.|> 1 7-F£iiU Pj{± 

s 9 x y -jgagssta* , arco * 5 izmi zti&» 

[0069] 1. QtS>£#Si ( i = 1 

m) Sr. [23C0 ( a) tc^-f — 7';^4># -£ 

zcDidizmnmt&tLf-zm&cvzti^tizcit 

[0 0 7 0] 2. X-fyTl-dft^^Ci (i = 
1, . . . . m) IZMLXffiJSm&Zli-t&V-V 
04c 7 ) ( c ) iz^t-— ~f)V Itm^X Wsb . itjtr*" 

[0071] 3. #Pj ( j = 1 n)tfttC 

ffi#«^^tl>V-KJ, H4« (c) t 

H3W- & . p - s icoi y h y ii . «te<z«Iflirtc¥5- 

T7— H» [BQ] = (mf ) +m 
Z\Z\X. mimZ, ffimzJb&mMW-FcoZttJril 



Lx$mzti&> 

[00 66 ] ^xy-<^K5B(co^TKW4. 
06«i, ^*<7)^xy-fi£?sesc7)7cr(50^xu-(7)K: 

3ScOM£7nLTI^4„ TcarhdealertVOV 
fL. cartdealer 11^3*^ 4 V— K^itjD^tl 

t v ^ . awfcwtwK-r * pray - k t , atsawr^a 
^u-mwmmcojtx^^j^u-amcnmii. muz^ 

It carideale rfcV^7-FJ, 03 c 7 ) 
( a ) t^-^— TVl^fflVVtflSfc&S em 1 tSem2 

tit'^UOaMMfcfcfciSaftLfeft, H4C0 ( c ) tcsrt 
x-r^^ffl^T. ^co«*«±^\ fltJt«Wff&*tf 

[00 67 ] SWfc&iv-HfcSWfcsSrV^v-H^IS 
(15) -c^s^n* . 

[0068] 
A. . . A P J ...(15) 

[00 72] 4. %^3r?xy-<ai7-KXt2«E±£Q 

[00 73] *SHHfcJ:r)TStSRSn^^xU-(i. # 

DlSSS^^^xy-co^xy-Maonxhtd:, -JS 
/h§^fcf)C!3:4. iJcts ^*«Sffi«^xij-te5ft, 
«StO*i^S^^T4 ^x y -fi£?Itfc 
V^T»A§ii4x>-^^^^ C7-KXI4K*) 
MW^4 . IfiMcoJ: J; ±&]y^)Ucr,MMWi± 

<^>itx'9)v~y°it^titz. mmzfohv- vco^m 

K t=H«ft ft fetufe , fltAWHBt HW*M0T 
«&hhn„ *ZX\ S*W*^xy-ofii3g (B 
Q) cOTnT-OQtfcft^y-Hco^ti. ( 1 

6) (CSfiaCs Xf77K 2, ac/3T'5te-r4 

[0074] 

( g + h ) + n ( g + h ) ... (16) 

f# (g + h) m&nztL&t:)b£&.t&. M3m&. 
( g + h ) flao*@v- h zmm Lfz nm<7>m%%mco 
znmznm-tz>« mmiz. im<mftmt:ftt&7 

xy-JtSR (MGQ) c07CfcOQfc:tift-&V-Hf:«tS! 
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mm. vxr^ ( i 7 ) -cfssfts. 

7— [MGQ] =m+m ( g/ 
1 7 ) 

i i tii . ffiffl s ^.t \^ h mm v-wmzmi-x. 

^TV^, (c) O-r— TT/KiJVVC, <7— F 

[0076] 

t\ NiiWV-KSr*-fS^xy-tcWLT. N0W/F 
••/ ? T ~rtf£mX'fo D s £ #U±„ tt* 

/UyfTvy ■ nXb(Q,BQ) = mf 
b») ...(18) 

= mf 

(W-D/2) ...(19). 
[0 0 82] |5]«t, 133&«9ffl#££*^5?x'J-J£ 

/UvPTvy ■ nXb(a,MGd) =m 
4(c)]) ...(20) 

= HI 

V W/f+(W(W-l))/2f 2 ) ...(2 1). 
[0 0 84] mmiZ^i (7))]/-; 7 T-vTcr>\n\m. 

&cr>X\ MGQ^feit-S. ?x U — j?Pl<50rJX htfiB Qtz 

a \f h n x h x *) /h§ < %h coim h frx-h h . 

[0085] mz, *»»95^?ftlt*aiWVit» 

fK flU*>. iOfflVMBW-JEU-'OWCfcV^ Tear j fc 
Tdealerj £-#tf fc U d^fr^iS^-TW. I§lt 
5?y? £*Ui, ^^O-mmco^bLXiS- 

[0086] H8T1i. tAT^fr SMfc-r^f-V- F 



[0075] 

f +h) +n (g/f+h) . . . ( 

C0|*g<7) b<7)7— Ytf? x >J — t'-e y f-f S t v> -3 £ i: 
[0077] ^it, 2otf>Sa£te*iVvt, ?xiJ-J 

[0078] ■ i+s^^iy -mmiz&tf & v- 

[00 79] • /l-y^T'y TtmhtLh . -fix^'ix^T" 
-7'Vl-^x y b U gc*^ 2 hZb. 
[0 080] ZZX\ flLf^xiJ-QWl-.y^T'y 

T-ffl^b $ tlX B K> , f— ^/K^P 7?7 7 7»{± , 

tt, HT^(18), ( 1 9) c7)J;atC^I>. 
[0081] 

log(«[2(b)] + (ra+n) (g+h) log(fi^[3( 

log(W+V) + (m+n) (g+h) log(V(V-l)/2+VW+W 

iaToS(20), (21)(OJ;'5m. 
[0083] 
log(fim(4(b)) + (m+n) (g/f+h) log(ffjR[ 

log(W/f+V) + (ra+n) (g/f+h) log(V(V-l)/2+ 

[0087] ^fr : (Semi V Ford V B 
uick)A(Sem2 V Ford V BUIC 
K) . 

[0088] wm^-/^-y^ '■ v— H^y^ft 

Jt^fcftttteRStli. fifioTs ( r carj, r de 
alerj ) , ( r autoj, r dealerj), 
( r autoj, r sales of f icej K S 
V ( r F o r d j , r s howroomj) ifi^ HBjStt 

<m^"y^^¥s\^h<r>\,zm^ih>Kh. 
[0 089] mmb^^xm^. 7xij-wot-pj 

*-TSS:«p , gtfe^Tv>yf-t/iV-F^-ov^T^W 
«s«(ck^"v 9 y ? ft tt s <t & . 

[00 90] $Ut£\ ^COgftti. E<Se<Sy< 

<77fy^D <z>jffcj£»S;fi.4. ^x»j 



— far?— KfcRiu J; ym^u^vTWLm^tifiiy- 

£>ftft(ii:\ «^ J :l,tl«5>'^)l J: 

ft. H8<7)ThIH~, r ca rj h r d e a 1 e r j i: ^ 

~?>y^-Lfzfrt>X'fo&* r a utoji: r deale 
r j fc^3!7-K£*^4:^i2#SfciS^y?j&* 
#-;t£>;fXT^ft. iftli, ?I'J- «7- b* r C arj 
fcv-yf-^ftj^, l-^xov— H<0»fc, SUfcW&Sf 

Aix#;tft) ^gk^flft^TTfcft. mcoyyftt 

mzmLxit. ms^-rx o^Thfih. 
[0091] ^y^ftftfS&W:. ot«2oco^^ 

^Tfi^ttft. 

[0 0 9 2] • 4-t^tifz^^'J-Qcri^—V-^^zm 
LX, Qi;tlDocl(:|,S^-7-KWordl, 
Doc2ttSWord2, Do c 3tfeliWo r d 
3. D o c4(Cfe§Wo r d4 t^Oia^OHf^^iX^" 

ft. mm^'y^y?'. MMm^y^v-mm^zx^ 
*t*vy. m^mt£7^)-m.mzkh^>v^y7\ r 
u^y^yy^LX'hi^. jmn. d oc i>do 

c 2>Do c 3>D o c A(T))mz=yy9¥i^%fth. 

[0093] ■ MflOlScB. Docj ( i = 
1, . . . , M) t. £SD o c^zZft^ftnfc-f 
?X»J — iZ'Vvl-t&Zr— V— b*SL Mat chi 

(i = l, . . . , M)CPI«5^ftlt(X37) 
{i. Ma t c hi >Ma t c h 2 >Ma t c h 3 . . . M 
a t c h M _j >Ma t c h M T"35ft*&n\ Doc^Do 
c 2 >Doc 3 . . . DoC(-i>DoC|, 0 

[0 0 94] 2o<?)d f -y-F£fS;L*:?xy-£ffiffl 

-rft, mi£btz7y7mmmizm~3im\ H9^^- 
£5%. 2^cr, r 7~^^-ti>^^'j-x-^m^mm-t 
h m^co 2Wtyy? mf? ? ? s ftft „ ?x'j 

—WS&SRiL&^b* vh (E, E) ftwxmiStf 

jf£m^&t. st*^0 7 h (x, x) iz^m*) . 
mm-t ft ^x co-xmmm %ft&« 

[0 0 9 5] zcD^y^Wf^^yii. ftmtLxm.% 
ftX^&o N«tf>fflB£3T$-47x!;— yy 
7ftlf7\?7l$, NX4C0PL M ( i , j ) ( i = 

o. . . n s j =0. . . 3 ) iz&^xmzti&o ma 

HQ^^fJW^Ii, ff^ljM ( i , j ) ( i 
= 0. . . 2, j=0. . . 3) fc LTSlSft-T^ft. 
mmt, -y b ( E , E ) „ (Se, E), (Se, 
Sy ) . fttf (X, X) ^rtt'. ttlfflXD7 
b ( 3 , 3 ) , ( 2 , 3 K ( 2 , 1 ) . RV ( 0 , 
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o) tLxmzftx^&. c.cr>mmizj:ft.i$, &-%m$ 
UXT<D x 5 izmmz? y? aire* ft . 

[00 96 ] ■ Xn?h (n, m) ft&JCWlZttLX. 

mWOipt, 3<7MX'htM^. zfthcoxw&yyy 
li. -Xn >y b ( i , j ) ( i = 0 . . . n , j = 
0. . . 3 ) ftCD^WX OffitVXnTtSr*. 

[0097] - XO-yb ( n 1 , ml) ftCD3c9<r>7 V 
7<7)XZ1TI£. nl£n2*OmlSm2tA^t&, 
Xn-yb (n2, m2) rt^^^X^TJ^tC&ft . 

[0098] iO^i^flW^ 7<0*3S14, TMJiOS 
'M.itV-MzX -?Xmm%ft& . #RJi\ Cone T 
r e e s t Dftf ft* SKftfrSiM: . 3 &7U<7) ft If 

^stpw-i jwf * &iisn-$-ft ztizx^x^mafto 

ft. W®AZ-r>\\X\±. 1 993^4^, Communications 
of the ACM, Vol. 36, No. 4, ^— >>'57-71<yX G. G. 

RobertsonfltKCi ft r Information Visualization Using 
3D Interactive Animationj Sr#!®c?3ii: , 
[00 99] i^V^tfWSafctza&^W*, El9tf0± 

xarxyy^mf^ftt, u&»u m9i / zm^xmt 
7?xt,zmi-&xxj y h coffin yypttirt&cvtem 

tcftbftftj^SrjSLT^fto te*Wfc^S*ifc^n-y b 
14. fit, |sit7 5xwxa 

[o i oo] ^m&zxhvzLV—wmte, Hiots 
^-^5x*i©^)iv^T. 7yxmzmmLx'ffhft&. 

T#x.S<, 01 0Sr#HB-^Si;. ?iij--7n-b7t 
ft. ^*S^5 0 i ?X>J- ■ 7Dty 

^xy-js:5fi^x^^sitf-rft zk&om&tl 
7-th <r k j6*r# . ?7X0 t;i3{tft^S*<?)ii:* s 
5 0(c«fciarv^*&. ?x«j-- ra-fe>y^ii^5xi 

(fJiif. XD'y b ( 2 , 3 ) ( 3 , 2 ) ) H^CO 

?xij- ■ ro^-y-tfii. Mfc^xU-MaSr-Tft 
Xn -y b ( 2 . 3 ) Rtf ( 3 , 2 ) rtOfeHi;i4MWt 

^s-r * - 1 ^"r-^ ft i t icmtf-^ x-fo 4.o4 

1 i?X'J--7P-t:'ytli Xn-yh (2, 3 ) tOfe 

*^ «*it*js-r ft £ t ^-ct ft . mmmcDtmtr 5 

O^S^-ft^-ir. ?xu--7n-feytli xcr.yh 

(3,2) mzmik*£mr& ;t*<, ^i^**rt- 
ft <= t § ft . ?xy- ■ 7ot >y -tm . 
*gf*^5 0 %Mz-&£X\ Xlim&cotyxizjtt&t 
X\ Wj&coXo S^xnyfM^^xK, H 
^ ft iSSO^jS^Mtt ft £ t ^'T- S ft . 



(S.6) 100-1 37738 (P2000-137738A) 



[0101] ±ie^J^\ 1 -OCT)*— T 7— vtpm?)*— 

ff. — V- H 1 (i^-V- F 2 1 OMST' 

eoJimi:, HI 1 o t^itf^f-iS. BP*>, icD 

(3.2) toBBRtfili*£jft-$-4. ifcfc. fl6H&££>& 

[0 1 0 2] II 211 ^BI^jaS^XfACO 
M-&{fc$:im-r&'r-?^-X. 1 2 0 6 £#A/C^S„ 

-fS^CM yfv^T. 1 2 0 8£-§XTV>S„ i^T" 
A|±IC, >fyf7?X1208&«L, i9±&?) 

«£*tr-f 2 0 8£±j£$-*fcj6or yf 

121 0£#tf o rn-tr >ytfl 2 0 411 f ■ 
^y^7z-Xl 2 0 2£tf-LTx— !f3&^fiS§n.5t 

?3iy—%%m~t&cDizmmzti&. /nt^ti 20 
4ii, act, ^xy-^iL, 7>7ttwmm£9m 

y^7x-Xl 2 0 2S:^-tTB^-1ft^$tL 

[0103] sii#ii *^Be^nis^\ hi 2 -cm* 
bwc*i>. mm. ssetd #WDj<z>«i&&»s>jftBH- 
(o»ij;w s fH. wa.tr, ±3*to, 
sfc^^fjtjM* 1 , sn^awawciT t>ti& ) , x 

[ 0 1 04 ] m-t^z. ^wm. ?Mtitz-xmt<zm 
-t&jf—v— Y<M&nj&&&w& ( m-tm. Rt/wm 

{ > t •/ ? xia^®^ ) t . ? x u -jsh ( mm 

[0105] *$tffl{z£ hWM.com-9i-mzs.hA 

7 xttiw&b ?xij -mm&miz z^x. ?x'j-^ 

tM Xifik *} /h§ < & 0 , ? x >j — a&BBtfBltfS < 3r 



[0106] ^i4f<50M^at^iS^*^. stft##\ 

^wiaa&tf tsHSr&ffiw-s ; t * < . *moz*t tx 

[0107] 

•thtz^z^ /bZ&A Xcr>4 yfv^SttfflU, 
S&^e^J^r ^ x u -K5IMt^^ !> . JWWW= ix . n'J 

[Hffi^ffi^l^J] 

[Hi] WfHteRtW-tiV-H ■ S^vf-^IWHi' 
^•THTfcSo 

[H2] mm^'vi-yx^ymmmm^x^Ax\ 
[H3] m^mmmmiyxr-^xm-thrz^z. v 

coMmk-tmx*fo&* 

[H4] *aKHt*5^t, iOSWWfc^'JHtOa* 

[H5] ftS^— yf7?m>l 'JSrV— 
•f-SMiiS:^H-t-fc^. 0 

[H6] m^mmmm^xr-Mzti^h^^v-am 

CH7] 1«ico*i*JK^*r§^x>j- 
[H8] *^HJt:J;i>, ^y^WTOil^t-HT-fe 
[H9] 2o<7) r 7-KSr*-t-l.^x';-<7)7>'^#(t^ 

[ii 0] ii^e^^xy-Ma^niff^^t-HT'fc 

[HI 1 3 ^r-V-b'^S^^coMM^tMOST 

mx-foh. 

[HI 2] *«HH^HSte^r«g=5r-H»F^^!tliaWliJS 

1 20 2 X-f ^^7i-X 

1 2 04 7°C?-fe<7^ 

12 0 6 r— X 

1 2 08 'fyfy^'X 

12 10 -fyf^t 
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[01] 



[05] 




(a) 



Wx,1 Wy 

Wx,2 Wy 

Wx,3 Wy 

Wx,4 Wy 



Wx.n- 



Wy 



(b) 



Wy 



1*M 
- Som m Wy 



1 M & 

Wy Wx,1 

Wy Wx,S! 

Wy Wx,3 

Wy Wx,4 



Wx.n 



1 *M 

Wy < — — Sem m 



(a) 



[02] 
(b) 





•7- K'Mh 


Doc1 


Ford, Show room 


Doc2 


Auto.SalesOffice 


Doc3 


Car r Dsaler 


Doc4 


Auto, Dealer 



7- KM 




Ford 


Doc1 


Showroom 


□oc1 


Auto 


Doc2,DljC4 


Dealer 


Doc3,Ddc4 







Wx,1 - 
Wx,2- 
Wx,3- 
Wx,4- 

Wx,n - 



- Wy,1 
-Wy r 2 
-Wy,3 
-Wy,4 

-Wy F n 



1 M 1 
- Scm m Sam p 



[07] 



(a) 



[03] 







3em1 
Sem2 
Sem3 


Car, Auto, Autom ob 1 1 e , Sed an 
Dealer.Showroom^alesOtfice 
Garage,Parking 



(h) 







Syn1 


Buick.Car 


Syn2 


Car,Garage 


Syn3 


Auto, Garage 


Syn4 


Ford .Car 


Syn5 


Ford.Auto 



7C<B X ij - 
Car and Dealer 



"(Semi or Ford or Huick) and (Sem2 or Ford or Buick) 



[06] 



Car and Dealer 



(Car or Auto or Automobile or Sedan or Ford or Garage or LJuick) and (Dealer or Showroom or SalesOffice or Ford or Buick) 
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(a) 



[04] 
<t>) 



[08] 







Dod1 


Ford,Sam2 | 


Dqc2 


Sem1,Sem2 


Ddc3 


Sem1 l S&m2 


Doo4 


Semi ,Sem2 



(o) 







Syn1 p 


Buick,Sem1 


Syn2' 


Sem1,Sem3 


Syn3' 


; brd,Saml 


3yn4' 


SeTn7,Sem!?.1 


Syn5 ; 


Forc^Buick 



[09] 



UTi] (3,3) 

|Sa,E| (?,3) |E,Sa| (3,2) 
|syg (1,3) I Se.Sel (3,2) |E,Sy| (3,1) 



| X,E | (0,3) lsy.Sel (l.a) |Se,S y | (2,1) | E,xj(3,0) 

[xjse| (0,?> [sy,Sy| (1,1) |Se,X| (2,0) 



xjsy] (0,1) |Sy,x| (1,0) 

[xjx1 (Q,0) 



JUH: e 




Se 




Sy 




X 









Semi 
Sem2 


Doc2,Doc3,Doc4 
Dod ,Doc2,Doc3,Doc4 



as* 



(Semi or Ford ar Buick) and (Sem2 or Ford or Buick) 




mm 



as* 



fttMficn **«i»?a K*tam 




[HI 2] 



-1202 



^1204 



120B 



1?06 



1210 
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[HI 0] 

FOl (3,3) 7^0 
X— Nk — -r 
|s e> E| (2,3) l^Sel (3,2) 

■ .-• 

|Sy,El (l,3) |ge,Se| (2,2) |E,Sy| (3,1) -75*2 

| XjT] {0,3) |Sy 5 S9| (1,2) |Se,Sy| (2,1) l E X | (3,0) ^V*3 

... V . / ... X ... / ...\. 7 /. .... 

|X,Se| (0,2) |Sy,Sy| (1,1) |Se t x[ (2,0) ^7X4 

- X.--V----X -/- 

|x,Sy| (0,1) [Sy^x](l,0) ^7/5 

v 



[HI 1] 

fill (3,3) 

|5e,E| (2,3) |e,Sq[ (3,?.) 9^X1 

_^....\^/..\^- 

|Sy,E| (1,3) |Sb,Sb| (2,2)^- iCiyl (3,1) 7^X2 

|x,e| (0,3) "^"" (sy.Sal a^) |Se^| ^lT~^ |e,X| (3,0) ?7*3 

|x,Sa| (0,2)""' | Sy F Sy| (IjT [Se.xl (2.0) V^r7,A 

•-. .-- 

|x,Sy| (0,1) - — ^ |Sy,X| (1,0) ^"7*5 

— ^ 



